Minimum Spanning Trees in Math Programming Models

Algorithms for the Minimum Spanning Tree (MST) problem are readily available. However, sometimes we want to solve this problem inside a Mathematical Programming model. Usually, this is for two reasons:

We have some side constraints
Or as part of a larger model

These reasons are essentially the same (a matter of gradation). Embedding an MST inside a model is not totally trivial.

Data

I used in our models the data from [1]. This data set has distances for 42 US cities.

----    298 SET cities  

Manchester, N.H.    ,    Montpelier, Vt.     ,    Detroit, Mich.      ,    Cleveland, Ohio     ,    Charleston, W.Va.   
Louisville, Ky.     ,    Indianapolis, Ind.  ,    Chicago, Ill.       ,    Milwaukee, Wis.     ,    Minneapolis, Minn.  
Pierre, S.D.        ,    Bismarck, N.D.      ,    Helena, Mont.       ,    Seattle, Wash.      ,    Portland, Ore.      
Boise, Idaho        ,    Salt Lake City, Utah,    Carson City, Nevada ,    Los Angeles, Calif. ,    Phoenix, Ariz.      
Santa Fe, N.M.      ,    Denver, Colo.       ,    Cheyenne, Wyo.      ,    Omaha, Neb.         ,    Des Moines, Iowa    
Kansas City, Mo.    ,    Topeka, Kans.       ,    Oklahoma City, Okla.,    Dallas, Tex.        ,    Little Rock, Ark.   
Memphis, Tenn.      ,    Jackson, Miss.      ,    New Orleans, La.    ,    Birmingham, Ala.    ,    Atlanta, Ga.        
Jacksonville, Fla.  ,    Columbia, S.C.      ,    Raleigh, N.C.       ,    Richmond, Va.       ,    Washington, D.C.    
Boston, Mass.       ,    Portland, Me.

An optimal spanning tree can look like:

Note: this plot and the geocoding of the cities were done using R and the ggmap package [2]. It took only a few lines of code!

Note 2: in all models below, I assume we have directed arcs. This makes the modeling easier [5].

Flow model

An intuitive model is as follows:

Choose a source node $s$ (any node will do)
All other nodes $t$ need to be reached from the source node (they are terminal nodes)
So we form paths from $s \rightarrow t$
Keep track of the arcs that are used in these paths and minimize the sum of the lengths of these arcs.

A MIP formulation can look like:

Flow formulation
\[\begin{align}&\color{darkblue}A_{i,j} = \text{true if arc $i \rightarrow j$ exists}\\ & \color{darkblue}b_{i,t} = \begin{cases} +1 & \text{if $i=s$}\\ -1 & \text{if $i=t$}\\ 0 & \text{otherwise} \end{cases}\\ &\color{darkblue}c_{i,j} = \text{cost of arc $i \rightarrow j$} \end{align}\]
\[\begin{align}&\color{darkred}x_{i,j} = \begin{cases}1 & \text{if arc $i \rightarrow j$ is part of the tree}\\ 0 & \text{otherwise} \end{cases}\\ & \color{darkred}y_{i,j,t} = \begin{cases} 1 & \text{if arc $i \rightarrow j$ is part of the path $s \rightarrow t$}\\ 0 & \text{otherwise} \end{cases} \end{align}\]
\[\begin{align}\min\>&\color{darkred}z = \sum_{i,j\|\color{darkblue}A(i,j)} \color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\& \sum_{j\|\color{darkblue}A(i,j)}\color{darkred}y_{i,j,t}=\sum_{j\|\color{darkblue}A(j,i)}\color{darkred}y_{j,i,t} + \color{darkblue}b_{i,t} && \forall i,t \\ & \color{darkred}x_{i,j} \ge \color{darkred}y_{i,j,t} && \forall i,j\|\color{darkblue}A_{i,j},t \\ & \color{darkred}x_{i,j},\color{darkred}y_{i,j,t} \in [0,1]\end{align}\]

Some notes:

The GAMS code can be found in [1].
The binary variables $\color{darkred}x_{i,j}$ and $\color{darkred}y_{i,j,t}$ can be relaxed to be continuous between zero and one. This makes the model a pure LP.
In GAMS we can solve this as an RMIP. This will relax integer variables but keeps the bounds for binary variables.
The relaxation only works for Simplex methods or for barrier methods followed by a crossover. Without the crossover, interior point methods will produce fractional solutions.
The solution being a tree is a side effect of minimizing the total cost.
The model is formulated to exploit sparsity in the underlying graph. Our data set is not sparse, so that does not help in our case.
This LP is very large: starting with just $n=42$ cities, we arrive at $n^2(n-1)=72,324$ variables and constraints. This count is for a complete graph.

Instead of having an extra index for each of the paths, we can combine them and at the source node start with an exogenous inflow of $n-1$ and at the terminal nodes an exogenous outflow of 1. This mode can look like:

Flow formulation II
\[\begin{align}&\color{darkblue}A_{i,j} = \text{true if arc $i \rightarrow j$ exists}\\ & \color{darkblue}b_{i} = \begin{cases} \color{darkblue}n-1 & \text{if $i=s$}\\ -1 & \text{otherwise} \end{cases}\\ &\color{darkblue}c_{i,j} = \text{cost of arc $i \rightarrow j$} \\ & \color{darkblue}M =\color{darkblue}n-1 \end{align}\]
\[\begin{align}&\color{darkred}x_{i,j} = \begin{cases}1 & \text{if arc $i \rightarrow j$ is part of the tree}\\ 0 & \text{otherwise} \end{cases}\\ & \color{darkred}f_{i,j} = \text{flow from node $i \rightarrow j$}\end{align}\]
\[\begin{align}\min\>&\color{darkred}z = \sum_{i,j\|\color{darkblue}A(i,j)} \color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\& \sum_{j\|\color{darkblue}A(i,j)}\color{darkred}f_{i,j}=\sum_{j\|\color{darkblue}A(j,i)}\color{darkred}f_{j,i} + \color{darkblue}b_{i} && \forall i \\ & \color{darkblue}M\cdot \color{darkred}x_{i,j} \ge \color{darkred}f_{i,j} && \forall i,j\|\color{darkblue}A_{i,j} \\ & \color{darkred}x_{i,j}\in \{0,1\}\\ & \color{darkred}f_{i,j} \in \{0,1,2,\dots,\color{darkblue}M\}\end{align}\]

This formulation is much more compact: we have 1,764 constraints and 3,444 variables. However, this is no longer an LP: we need to solve it as a MIP. In my case, using Cplex, the smaller MIP solved faster than the large LP. In both cases less than 10 seconds, so performance is not likely an issue for a problem this size.

Power set formulations

There are a number of formulations that use a powerset: all subsets of the nodes. Here is a simple one [3,4]:

Powerset formulation
\[\begin{align}\min\>&\color{darkred}z = \sum_{i,j\|\color{darkblue}A(i,j)} \color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\& \sum_{i,j\|\color{darkblue}A(i,j)}\color{darkred}x_{i,j}= \color{darkblue}n-1 \\ & \sum_{i,j\in S\|\color{darkblue}A(i,j)}\color{darkred}x_{i,j} \le \|S\|-1 && \forall S\subset V \text{ nodes}\\ & \color{darkred}x_{i,j}\in [0,1]\end{align}\]

The second constraint is over all subsets $S$ of the nodes (except the one with 0 nodes or with all nodes). The number of such subsets is $2^n-2$. In our case that is \[2^{42}-2 = 4,398,046,511,102\approx 4.4 \times 10^{12}\] This is a bit more than we can handle. This means that we can drop from consideration any formulation that has "for all subsets of the nodes" in it.

MTZ formulations

We can borrow the MTZ (Miller-Tucker-Zemlin [6]) subtour-elimination constraints from TSP models.

MTZ formulation
\[\begin{align}\min\>&\color{darkred}z = \sum_{i,j\|\color{darkblue}A(i,j)} \color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\& \sum_{s,j\|\color{darkblue}A(s,j)}\color{darkred}x_{s,j} \ge 1 && \text{at least 1 out from source $s$}\\ & \sum_{j,i\|\color{darkblue}A(j,t)}\color{darkred}x_{j,t} = 1 && \forall t\ne s, \text{one incoming arc}\\ &\color{darkred}u_i-\color{darkred}u_j + (\color{darkblue}n-1)\cdot \color{darkred}x_{i,j}\le \color{darkblue}n-2 &&\forall i \ne j, i\ne s, j\ne s\\ &\color{darkred}u_s = 0 && \text{source node $s$}\\ & \color{darkred}x_{i,j}\in \{0,1\}\\& \color{darkred}u_t \in [1,\color{darkblue}n] && \forall t \ne s \end{align}\]

MTZ formulation

\[\begin{align}\min\>&\color{darkred}z = \sum_{i,j|\color{darkblue}A(i,j)} \color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\& \sum_{s,j|\color{darkblue}A(s,j)}\color{darkred}x_{s,j} \ge 1 && \text{at least 1 out from source $s$}\\ & \sum_{j,i|\color{darkblue}A(j,t)}\color{darkred}x_{j,t} = 1 && \forall t\ne s, \text{one incoming arc}\\ &\color{darkred}u_i-\color{darkred}u_j + (\color{darkblue}n-1)\cdot \color{darkred}x_{i,j}\le \color{darkblue}n-2 &&\forall i \ne j, i\ne s, j\ne s\\ &\color{darkred}u_s = 0 && \text{source node $s$}\\ & \color{darkred}x_{i,j}\in \{0,1\}\\& \color{darkred}u_t \in [1,\color{darkblue}n] && \forall t \ne s \end{align}\]

As for the TSP problem, we can apply the lifting technique [7] and use \[\color{darkred}u_i-\color{darkred}u_j + (\color{darkblue}n-1)\cdot \color{darkred}x_{i,j} + (\color{darkblue}n-3)\cdot \color{darkred}x_{j,i} \le \color{darkblue}n-2\]

These models tend to perform not as well as some of the other models.

References

How to formulate the Minimum Spanning Tree problem as a MIP, https://yetanothermathprogrammingconsultant.blogspot.com/2009/05/how-to-formulate-minimum-spanning-tree.html
David Kahle, Hadley Wickham, ggmap: Spatial Visualization with ggplot2, https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf
Magnanti, Thomas L.; Wolsey, Laurence A., Optimal Trees, in M.O. Ball, T.L. Magnanti, C.L. Monma, and G.L. Nemhauser, editors, Network Models, volume 7 of Handbooks in Operations Research and Management Science, Chapter 9, pages 503--616. North-Holland, 1995
Justin C. Williams, “A linear-size zero - one programming model for the minimum spanning tree problem in planar graphs”, Networks, 39(1), 2002
Directed vs undirected networks in math programming models, https://yetanothermathprogrammingconsultant.blogspot.com/2020/12/directed-vs-undirected-networks-in-math.html
Miller, C., A. Tucker, R. Zemlin. 1960. Integer programming formulation of traveling salesman problems. J. ACM 7 326-329
Desrochers, M., Laporte, G., Improvements and extensions to the Miller-Tucker-Zemlin subtour elimination constraints, Operations Research Letters, 10 (1991) 27-36.

Minimum Spanning Trees in Math Programming Models

Flow model

Power set formulations

MTZ formulations

References

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...