A famous result in graph theory is that we can color a planar graph with just four colors [1]. The MIP model for coloring a graph is really simple. Let's try this with a relatively large data set: the map of US counties. We'll see that there are some hurdles on the way.
Building block: MIP model for coloring a graph
The problem is as follows: we want to assign a color to each node such that nodes connected by an arc don't have the same color. The objective is to minimize the number of different colors. For the MIP model we use an assignment variable: \[\color{darkred}x_{n,c} = \begin{cases}1 & \text{if node $n$ gets color $c$} \\ 0 & \text{otherwise}\end{cases}\] To keep track of the colors we have: \[\color{darkred}u_c = \begin{cases}1 & \text{color $c$ is used for some nodes} \\ 0 & \text{otherwise}\end{cases}\]
| MIP Model |
|---|
| \[\begin{align} \min & \sum_c \color{darkred}u_c \\ & \sum_c \color{darkred}x_{n,c}=1 &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le \color{darkred}u_c && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \color{darkred}u_c \le \color{darkred}u_{c-1} && \forall c\gt 1 \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}u_c \in \{0,1\} \end{align} \] |
The last constraint is optional. It is used to get nicer-looking solutions and to reduce symmetry.
With a random graph with 20 nodes and 57 arcs, our minimum number of colors is 4. Of course, other graphs may need more or fewer colors.
Building block: Map of US counties
To produce a map for all US counties, I used the GeoJSON file from [2]. After extracting the id fields from the JSON file we have a collection of counties encoded by their FIPS code [3]. These 6-digit codes have two parts: a 2-digit state id and a 4-digit county id. We have 3221 counties in this file. This will become our set of nodes.
The data for the adjacency matrix comes from [4]. I had some issues with this file:
- This file had more counties than I have in my map data. All counties not in the map file are ignored. These were all related to faraway territories: Virgin Islands, American Samoa, Guam, Northern Mariana Islands.
- The matrix in [4] is larger than strictly needed. It says all counties are adjacent to itself. Also, I made the matrix strictly upper triangular. I.e. two counties \(i\) and \(j\) can only be adjacent if \(i\lt j\). This makes the data smaller without losing any information.
- There is a problem with county 27111. It has different names in the file.
- County 27165 has an empty name in some of the records.
- Data problems are fact of life! Even for data from reputable sources. Always check incoming data for errors, and add data checks to the model.
MODEL STATISTICSBLOCKS OF EQUATIONS 3 SINGLE EQUATIONS 60,102BLOCKS OF VARIABLES 3 SINGLE VARIABLES 19,333NON ZERO ELEMENTS 189,973 DISCRETE VARIABLES 19,332
The size depends quite a bit on the number of colors in set \(c\). The solution is however not what we would expect:
**** SOLVER STATUS 1 Normal Completion
**** MODEL STATUS 1 Optimal
**** OBJECTIVE VALUE 5.0000
Why don't we get an objective value of 4? We need to do some debugging. Let's see if we can identify an offending node. I.e., when we remove that node (county), the remaining map can be properly colored with 4 colors. Our debugging tool becomes the following model:
| MIP Debugging Model |
|---|
| \[\begin{align} \min\> & \color{darkred}z=\sum_c \color{darkred}u_c \\ &\color{darkred}z=4 \\ & \sum_c \color{darkred}x_{n,c}=\color{darkred}v_n &&\forall n \\ & \color{darkred}x_{i,c}+\color{darkred}x_{j,c} \le \color{darkred}u_c && \forall \color{darkblue}{\mathit{arc}}_{i,j}, c \\ & \sum_n \color{darkred}v_n = {\bf{card}}(n)-1 \\ & \color{darkred}x_{n,c} \in \{0,1\} \\ & \color{darkred}u_c \in \{0,1\} \\ & \color{darkred}v_n \in \{0,1\} \end{align} \] |
The new variable \(\color{darkred}v_n\) indicates if node \(n\) is used. The node with \(\color{darkred}v_n=0\) in the solution is suspect. When we run this, Glades County in Florida is selected to be ignored. When we fix Glades county to be included, neighboring Okeechobee County is removed. So let's have a look at this part of Florida:
We see that Glades, Okeechobee, Martin, Palm Beach, and Hendry form a bit of a pie chart with one single point where they meet. Such a pie chart with a single meeting point can create problems:
There is no way we can color this with 4 colors. If we look at a description of the 4-color theorem, we actually see such a single point is removed from consideration:
The four-color theorem states that any map in a plane can be colored using four-colors in such a way that regions sharing a common boundary (other than a single point) do not share the same color. [5]
After removing arcs Glades-Martin and Glades-Palm Beach, our model finds a 4-coloring scheme for the US county map.
The map
Conclusions
- Data can never be trusted, even if it comes from the Census bureau.
- Models can help in debugging by changing the question. Here we asked: is there a single node (county) we can remove to make the map four-colorable? This model led us directly to the underlying problem.
- Touching a neighboring county by a single point does not count in the 4-color theorem. It looks like these cases can happen in the real world.
References
- Four-color theorem, https://en.wikipedia.org/wiki/Four_color_theorem
- https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json
- FIPS county codes, https://en.wikipedia.org/wiki/FIPS_county_code
- https://www2.census.gov/geo/docs/reference/county_adjacency.txt
- Four-color theorem, https://mathworld.wolfram.com/Four-ColorTheorem.html
Appendix: GAMS Model
$ontext |




