Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 809

Coloring the US county map

$
0
0

A famous result in graph theory is that we can color a planar graph with just four colors [1]. The MIP model for coloring a graph is really simple. Let's try this with a relatively large data set: the map of US counties. We'll see that there are some hurdles on the way.


Building block: MIP model for coloring a graph


The problem is as follows: we want to assign a color to each node such that nodes connected by an arc don't have the same color. The objective is to minimize the number of different colors. For the MIP model we use an assignment variable:  \[\color{darkred}x_{n,c} = \begin{cases}1 & \text{if node $n$ gets color $c$} \\  0 & \text{otherwise}\end{cases}\] To keep track of the colors we have: \[\color{darkred}u_c = \begin{cases}1 & \text{color $c$ is used for some nodes} \\ 0 & \text{otherwise}\end{cases}\]


The MIP model can look like:

MIP Model
\[\begin{align} \min & \sum_c \color{darkred}u_c \\ & \sum_c \color{darkred}x_{n,c}=1 &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le \color{darkred}u_c && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \color{darkred}u_c \le \color{darkred}u_{c-1} && \forall c\gt 1 \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}u_c \in \{0,1\} \end{align} \]


The last constraint is optional. It is used to get nicer-looking solutions and to reduce symmetry.

With a random graph with 20 nodes and 57 arcs, our minimum number of colors is 4. Of course, other graphs may need more or fewer colors.


This was a very small data set and the MIP model solved very fast (it used 0 nodes to prove optimality; although when solved in the root node, I guess more honest accounting would say this needed 1 node). Next, up to something bigger.

Building block: Map of US counties


To produce a map for all US counties, I used the GeoJSON file from [2]. After extracting the id fields from the JSON file we have a collection of counties encoded by their FIPS code [3]. These 6-digit codes have two parts: a 2-digit state id and a 4-digit county id. We have 3221 counties in this file. This will become our set of nodes.

The data for the adjacency matrix comes from [4]. I had some issues with this file:

  • This file had more counties than I have in my map data. All counties not in the map file are ignored. These were all related to faraway territories: Virgin Islands, American Samoa, Guam, Northern Mariana Islands. 
  • The matrix in [4] is larger than strictly needed. It says all counties are adjacent to itself. Also, I made the matrix strictly upper triangular. I.e. two counties \(i\) and \(j\) can only be adjacent if \(i\lt j\). This makes the data smaller without losing any information.
  • There is a problem with county 27111. It has different names in the file.


     
  • County 27165 has an empty name in some of the records.
  • Data problems are fact of life! Even for data from reputable sources. Always check incoming data for errors, and add data checks to the model.
After fixing the problems with the adjacency file, we can try to color the map. The underlying graph has 3,221 nodes and 9,478 undirected arcs. The generated model is not small: 

MODEL STATISTICS

BLOCKS OF EQUATIONS           3     SINGLE EQUATIONS       60,102
BLOCKS OF VARIABLES           3     SINGLE VARIABLES       19,333
NON ZERO ELEMENTS       189,973     DISCRETE VARIABLES     19,332

The size depends quite a bit on the number of colors in set \(c\). The solution is however not what we would expect:

**** SOLVER STATUS     1 Normal Completion
**** MODEL STATUS      1 Optimal
**** OBJECTIVE VALUE                5.0000

Why don't we get an objective value of 4?  We need to do some debugging. Let's see if we can identify an offending node. I.e., when we remove that node (county), the remaining map can be properly colored with 4 colors. Our debugging tool becomes the following model:


MIP Debugging Model
\[\begin{align} \min\> & \color{darkred}z=\sum_c \color{darkred}u_c \\ &\color{darkred}z=4 \\ & \sum_c \color{darkred}x_{n,c}=\color{darkred}v_n &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le \color{darkred}u_c && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \sum_n \color{darkred}v_n = {\bf{card}}(n)-1 \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}u_c \in \{0,1\} \\ & \color{darkred}v_n \in \{0,1\} \end{align} \]


The new variable \(\color{darkred}v_n\) indicates if node \(n\) is used. The node with \(\color{darkred}v_n=0\) in the solution is suspect. When we run this, Glades County in Florida is selected to be ignored. When we fix Glades county to be included, neighboring Okeechobee County is removed. So let's have a look at this part of Florida:




We see that Glades, Okeechobee, Martin, Palm Beach, and Hendry form a bit of a pie chart with one single point where they meet. Such a pie chart with a single meeting point can create problems:



There is no way we can color this with 4 colors. If we look at a description of the 4-color theorem, we actually see such a single point is removed from consideration:

The four-color theorem states that any map in a plane can be colored using four-colors in such a way that regions sharing a common boundary (other than a single point) do not share the same color. [5]

After removing arcs Glades-Martin and Glades-Palm Beach, our model finds a 4-coloring scheme for the US county map.

The map





Conclusions


Applying the 4-color theorem to an actual largish data set provides some good lessons:
  • Data can never be trusted, even if it comes from the Census bureau.
  • Models can help in debugging by changing the question. Here we asked: is there a single node (county) we can remove to make the map four-colorable? This model led us directly to the underlying problem.  
  • Touching a neighboring county by a single point does not count in the 4-color theorem. It looks like these cases can happen in the real world.


References



Appendix: GAMS Model


$ontext

 
Standard Graph Coloring Model

 
Color map with US counties

$offtext


*---------------------------------------------------------------
* data
*---------------------------------------------------------------

sets
   n     
'nodes: counties (fips code)'
   a(n,n)
'arcs: adjacency -- upper-triangular part only'
   c     
'colors'/color1*color5/

   county
'county names'
   map(n,county<)
;

$include data.inc

* fix data wrt Glades
a(
'12043','12085') = no;
a(
'12043','12099') = no;


scalar numCounties 'number of counties in map';
numCounties =
card(n);
display numCounties;

alias(n,i,j);

*---------------------------------------------------------------
* data checks
*---------------------------------------------------------------

abort$(card(map)<>numCounties) "check map";
abort$sum(a(i,j)$(ord(i)>=ord(j)),1) "a(i,j) is not strictly upper-triangular";

*---------------------------------------------------------------
* model
*---------------------------------------------------------------

binaryvariables
   x(n,c)
'assign color to node'
   u(c)  
'color is used'
;

variable
   numColors
'number of colors needed'
;

equations
   objective           
'minimize number of colors used'
   assign(n)           
'every node should have exactly one color'
   notSameColor(i,j,c) 
'adjacent vertices cannot have the same color'
   order(c)            
'ordering constraint (optional)'
;

objective..  numColors =e=
sum(c, u(c));

assign(n).. 
sum(c, x(n,c)) =e= 1;

notSameColor(a(i,j),c).. x(i,c)+x(j,c) =l= u(c);

order(c-1).. u(c) =l= u(c-1);


model color /all/;
option optcr=0, threads=16;
solve color minimizing numColors using mip;


display x.l,u.l,numColors.l;

*---------------------------------------------------------------
* export results
*---------------------------------------------------------------

parameter mapcolors(n);
mapcolors(n) =
sum(c, ord(c)*round(x.l(n,c)));

file f /mapcolors.csv/;
* csv format
f.pc=5;
put f,"id","color"/;
loop(n,
  
put n.tl:0,mapcolors(n):0:0/;
);




Viewing all articles
Browse latest Browse all 809

Trending Articles