Scipy.optimize.linprog [1] recently added a sparse interior point solver [2]. In theory we should be able to solve some larger problems with this solver. However the input format is matrix based. This makes it difficult to express LP models without much tedious programming. Of course if the LP model is very structured things are a bit easier. In [3] the question came up if we can solve some reasonable sized transportation problems with this solver. As transportation problems translate into large but easy LPs (very sparse, network structure) this would be a good example to try out.
An LP model for the transportation problem can look like:
Here \(i\) indicate the supply nodes and \(j\) the demand nodes. The problem is feasible if total demand does not exceed total supply (i.e. \(\sum_i s_i \ge \sum_j d_j\)).
Even if the transportation problem is dense (that is each supply node can serve all demand nodes or in other words each link \( i \rightarrow j\) exists), the LP matrix is sparse. There are 2 nonzeros per column.
The documentation mentions we can pass on the LP matrix as a sparse matrix. Here are some estimates of the difference in memory usage:
For the \(1000\times 1000\) case we see that a sparse storage scheme will be about 500 times as efficient.
When I run this, I see:
This proves we can actually solve a \(1000 \times 1000\) transportation problem (leading to an LP with a million variables) using standard Python tools.
An LP model for the transportation problem can look like:
| Transportation Model |
|---|
| \[ \begin{align} \min \> & \sum_{i,j} \color{darkblue}c_{i,j} \color{darkred} x_{i,j} \\ & \sum_j \color{darkred} x_{i,j} \le \color{darkblue}s_i &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \ge \color{darkblue}d_j &&\forall j\\ & \color{darkred}x_{i,j}\ge 0\end{align} \] |
Here \(i\) indicate the supply nodes and \(j\) the demand nodes. The problem is feasible if total demand does not exceed total supply (i.e. \(\sum_i s_i \ge \sum_j d_j\)).
Even if the transportation problem is dense (that is each supply node can serve all demand nodes or in other words each link \( i \rightarrow j\) exists), the LP matrix is sparse. There are 2 nonzeros per column.
| LP Matrix |
The documentation mentions we can pass on the LP matrix as a sparse matrix. Here are some estimates of the difference in memory usage:
| 100x100 | 500x500 | 1000x1000 | |
|---|---|---|---|
| Source Nodes | 100 | 500 | 1,000 |
| Destination Nodes | 100 | 500 | 1,000 |
| LP Variables | 10,000 | 250,000 | 1,000,000 |
| LP Constraints | 200 | 500 | 2,000 |
| LP Nonzero Elements | 20,000 | 500,000 | 2,000,000 |
| Dense Memory Usage (MB) | 15 | 1,907 | 15,258 |
| Sparse Memory Usage (MB) | 0.3 | 7.6 | 30.5 |
For the \(1000\times 1000\) case we see that a sparse storage scheme will be about 500 times as efficient.
Solving a 1000x1000 transportation problem: Implementation
- The package scipy.sparse [4] is used to form a sparse matrix.
- Scipy.optimize.linprog does not allow for \(\ge\) constraints. So our model becomes: \[\begin{align} \min &\sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} \le s_i &&\forall i \\ & \sum_i -x_{i,j} \le -d_j &&\forall j\\ & x_{i,j}\ge 0\end{align}\]
When I run this, I see:
Primal Feasibility Dual Feasibility Duality Gap Step Path Parameter Objective
1.01.01.0-1.04999334.387281
0.010966102655090.010966102655040.010966102655041.00.010966102655233423127.924532
0.0074707190847310.0074707190846950.0074707190846950.33691982129820.0074707190848261045138.710249
0.0073756964397050.0073756964396690.0073756964396690.014051713781910.007375696439798946062.4541516
0.0069005237100370.0069005237100040.0069005237100040.071516119893270.006900523710125631457.8940984
0.0033926882271850.0033926882271690.0033926882271690.55427656540860.003392688227229106030.5627759
0.0027162167262180.0027162167262050.0027162167262050.22108237725460.00271621672625277660.93708537
0.001516054263280.0015160542632720.0015160542632720.47061617027720.00151605426329939012.6976106
0.0012383828831990.0012383828831930.0012383828831930.20073815298470.00123838288321531262.77924434
0.00068887637193640.0006888763719330.00068887637193310.47119554969180.000688876371945216884.5788155
0.00040453116015410.00040453116015210.00040453116015220.45045772435740.00040453116015939812.570668161
0.00032784355638580.00032784355638420.00032784355638420.20620715999360.000327843556397943.50442653
0.00019381748726020.00019381748725930.00019381748725930.43049589506030.00019381748726274718.01892459
0.00012721273362630.00012721273362570.00012721273362570.3717755628580.0001272127336283126.320160308
7.325610966318e-057.325610966282e-057.325610966283e-050.45269863331137.325610966411e-051837.061691682
6.047737643405e-056.047737643373e-056.047737643375e-050.18969427780686.047737643482e-051530.292617672
3.301112106729e-053.301112106712e-053.301112106713e-050.47584409114313.301112106771e-05870.6399411648
2.231615463384e-052.231615463375e-052.231615463374e-050.35626693880942.231615463413e-05613.0954966036
1.300693055479e-051.300693055474e-051.300693055474e-050.44376942847221.300693055496e-05388.3160007487
7.533045251385e-067.533045251368e-067.533045251357e-060.44856350948367.533045251489e-06255.9636413848
3.799832196644e-063.799832196622e-063.799832196633e-060.52646433801523.7998321967e-06165.5742065953
2.01284588862e-062.012845888624e-062.012845888615e-060.50064160283362.01284588865e-06122.2520897954
1.143491145379e-061.143491145387e-061.143491145377e-060.47122067516121.143491145397e-06101.1678772704
5.277850584407e-075.277850584393e-075.277850584402e-070.57111390274875.277850584494e-0786.20125171613
3.125695105059e-073.125695105195e-073.125695105058e-070.43159450263633.125695105113e-0780.96090171621
1.118500099738e-071.118500099884e-071.118500099743e-070.67431187431891.118500099763e-0776.06812425522
4.412565084911e-084.412565086951e-084.412565085053e-080.62972570045794.412565085131e-0874.41374033755
6.833044779903e-096.833044770544e-096.833044776856e-090.86823334535776.833044776965e-0973.50145019804
3.3755500974e-103.375549807043e-103.375549865145e-100.95283867739983.375549866004e-1073.34206256371
1.066148223577e-131.065916625724e-131.066069704785e-130.99987769873551.066069928771e-1373.3337765897
7.763476236577e-183.543282811637e-175.469419174887e-180.99995000350895.330350298476e-1873.3337739491
Optimization terminated successfully.
Current function value: 73.333774
Iterations: 30
Filename: transport.py
Line # Mem usage Increment Line Contents
================================================
5970.6 MiB 70.6 MiB @profile
60defrun():
61# dimensions
6270.6 MiB 0.0 MiB M =1000# sources
6370.6 MiB 0.0 MiB N =1000# destinations
6478.3 MiB 7.7 MiB data = GenerateData(M,N)
65108.9 MiB 30.5 MiB lpdata = FormLPData(data)
66122.6 MiB 13.7 MiB res = opt.linprog(c=np.reshape(data['c'],M*N),A_ub=lpdata['A'],b_ub=lpdata['rhs'],options={'sparse':True, 'disp':True})
This proves we can actually solve a \(1000 \times 1000\) transportation problem (leading to an LP with a million variables) using standard Python tools.
References
- https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html
- https://docs.scipy.org/doc/scipy/reference/optimize.linprog-interior-point.html
- Maximum number of decision variables in scipy linear programming module in Python, https://stackoverflow.com/questions/57579147/maximum-number-of-decision-variables-in-scipy-linear-programming-module-in-pytho
- https://docs.scipy.org/doc/scipy/reference/sparse.html