Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all 804 articles
Browse latest View live

Scheduling teachers

$
0
0
From [1]:

I am trying to find a solution to the following problem: 
  • There are 6 teachers
  • Each teacher can only work for 8 hours a day
  • Each teacher must have a 30 minute break ideally near the middle of his shift
  • The following table shows the number of teacher needed in the room at certain times:
  • 7am - 8am 1 teacher
  • 8am - 9am 2 teacher
  • 10am - 11am 5 teacher
  • 11am - 5pm 6 teacher
  • 5pm - 6pm 2 teacher 
What is a good way to solve this (ideally in python and Google OR-Tools) ? 
Thank you

Initial analysis

From the demand data we see that we need all 6 teachers from 11am-5pm working without lunch break. This is 6 hours. So there is no way to allow a lunch break near the middle of the shift.


We have more problems. The picture of the demand data indicates we have no demand between 9am and 10am. That does not look right.

I believe that looking critically at your data is essential for successful optimization applications. You can learn a lot from just a bit of staring.

Alternative problem 1

We can ask a few different questions. If we only allow shifts of the form: 4 hours work, 0.5 hour lunch, 4 hours work (i.e. lunch perfectly in the middle of the shift), how many teachers to we need to meet the demand. To model this, we can assume time periods of half an hour. The number of possible shifts is small:


With an enumeration of the shifts, we can model this as a covering problem:

Covering Model
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_{s|\color{darkblue}{\mathit cover}(s,t)} \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Here \(\mathit{cover}(s,t)=\text{True}\) if shift \(s\) covers time period \(t\). If we would interpret \(\mathit cover(s,t)\) as a binary (data) matrix \[\mathit cover_{s,t} = \begin{cases} 1 & \text{if shift $s$ covers time period $t$}\\ 0 & \text{otherwise}\end{cases}\] we can also write:

Covering Model (alternative interpretation)
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_s \color{darkblue}{\mathit cover}_{s,t} \cdot \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Our new assumptions are:

  • we only allow shifts with a lunch break in the middle
  • we add to our demand data: 4 teachers needed between 9am and 10am
After solving this easy MIP model we see:

----55 VARIABLE x.L  number of shifts needed

shift1 2.000, shift4 2.000, shift5 2.000, shift6 2.000


----55 VARIABLE z.L = 8.000 total number of shifts

I.e. we need 8 teachers to handle this workload.


The picture shows we are overshooting demand in quite a few periods. Note: this picture illustrates the left-hand side (orange bars) and right-hand side (blue line) of the demand equation in the Covering Model.

Alternative problem 2

Lets make things a bit more complicated. We allow now the following rules for a single shift:


  • The work period before the lunch bread is between 3 and 5 hours (or between 6 and 10 time periods)
  • There is a lunch break of 0.5 or 1 hour.
  • After lunch there is another work period of between 3 and 5 hours. The total number of working hours is 8.
When we enumerate the shifts, we see:



We now have 55 different shifts to consider.

The results look like:


----72 VARIABLE x.L  number of shifts needed

shift1 1.000, shift22 1.000, shift26 1.000, shift39 1.000, shift49 1.000, shift52 1.000
shift55 1.000


----72 VARIABLE z.L = 7.000 total number of shifts


We see the number of teachers needed is 7. We are closer to the demand curve:



We see that if we add more flexibility we can do a bit better. Achieving 6 teachers is almost impossible. We would need to introduce shifts like: work 2 hours, lunch, work 6 hours. The teachers union would object.

It is noted there are quite a few other methods to solve models like this where we don't need to enumerate all possible shifts [2,3]. For larger problems it may not be feasible to employ the shift enumeration scheme we used here.

References



R/Python + C++

$
0
0
In some recent projects, I was working on using algorithms implemented in C++ from R and Python. Basically the idea is: Python and R are great languages for scripting but they are slow as molasses. So, it may make sense to develop the time consuming algorithms in C++ while driving the algorithm from R or Python.

R and C++: Rcpp.


The standard way to build interfaces between R and C++ code is to use Rcpp [1,2].




It is possible to directly interface R with low-level C code, This will require a lot of code and knowledge of R internals. Rcpp automates a lot of this. E.g. Rcpp will take care of translating an R vector into a C++ vector.

Rcpp supports both small fragments of C++ code passed on as an R string to more coarse-grained file based approach [3]. For Windows, you need to download the GNU compilers [4].

If you are new to both  Rcpp and to building your own R packages [5] things may be a bit overwhelming.

Rstudio can help a lot. It supports a lot of very useful tasks:

  • Syntax coloring for C++ code. 
  • Building projects.
  • Git version control.
  • Documentation tools (rmarkdown and bookdown). My documentation is also a good test: it executes almost all of the code when building the document.

Editing C++ code in RStudio


Basically I never have to leave RStudio.

I have added an alternative driver file for my C++ code so I can debug it in Visual Studio. I used it only a few times: most of the time I just used RStudio.


Python and C++: pybind11


pybind11 [6] is in many respects similar to Rcpp, although it requires a little bit more programming to bridge the gap between Python and C++.



In the beginning of the above Youtube video [7], the presenter compares pybind11 with some of the alternatives:

  • SWIG: author of SWIG says: don't use it
  • ctypes: call c functions but not c++
  • CFFI: call c functions
  • Boost.Python: support for older C++ standards, but not much maintained
  • pybind11: modern 

As with Rcpp, calling the compiler is done through running a build or setup script. For Rcpp I used the GNU compilers, while pybind11/pip install supports the Visual Studio C++ compiler. This also means that if you have little experience with both pybind11 and creating packages, the learning curve may be steep.


References


  1. http://www.rcpp.org
  2. Dirk Eddelbuettel, Seamless R and C++ Integration with Rcpp, Springer, 2013
  3. Chapter: Rewriting R code in C++https://adv-r.hadley.nz/rcpp.html 
  4. Hadley Wickham, R packages, O'Reilly, 2015 
  5. https://cran.r-project.org/bin/windows/Rtools/
  6. https://pybind11.readthedocs.io/en/master/
  7. Robert Smallshire, Integrate Python and C++ with pybind11, https://www.youtube.com/watch?v=YReJ3pSnNDo

Python-MIP

$
0
0
This is another modeling tool for Python.

There are quite a few modeling tools available for Python: Pyomo, PuLP, and most commercial LP/MIP solvers some with some Python modeling layer.

This is what caught my eye when reading about Python-MIP:


  • The name is rather unimaginative.
  • Looks like the authors are from Brazil.
  • Supported solvers are CBC and Gurobi.
  • The Python-MIP is compatible with the just-in-time compiler PyPy, which can lead to substantial performance improvements. 
  • It is claimed that with PyPy jit, python-MIP can be 25 times as fast than the Gurobi modeling tool.
  • There are some interesting facilities supported by Python-MIP:
    • Cuts can be provided using a call-back mechanism
    • Support for MIPSTART (initial integer solution)
    • Solution pool


Question


In [3] an interesting question came up. The value of an integer variable is often slightly non-integer. E.g. something like 0.0000011625. This is the result of the integer feasibility tolerance that a solver applies. In the discussion [3] the remark is made:


I believe there is more to this. Rounding integer solutions can lead to larger infeasibilities. With some aggressive presolve/scaling these infeasibilities can sometimes be large after postsolve/unscaling. Also: some equations may have long summations of binary variables. This would accumulate a lot of rounding errors. And then there are these big-M constraints....

It also means that using the steps:

solve
fix solution (or fix integers) to optimal solution
solve

may lead to "feasible" for the first solve, but "infeasible" for the second solve. E.g. when we want duals for the fixed LP we use this "fix integers" step.

Safer would be to tighten the integer feasibility tolerance. Cplex even allows epint=0 (epint is Cplex's integer feasibility tolerance). Of course tightening the integer feasibility tolerance will likely lead to longer solution times.

Indeed, modeling systems and solvers currently handle this by offloading the problem to the user. The solver is probably the right place to deal with this. Not sure developers are eager to work on this. Taking responsibility by simple-minded rounding may be asking for more problems than it solves.

On the other hand, these slightly fractional values certainly cause confusion. Especially for beginners in MIP modeling.

So the question remains: is rounding integer variables a good idea?

References





Octeract

$
0
0
I don't know what or who this is.

This seems to be a parallel deterministic solver for non-convex MINLPs.  Some things I noticed:


  • !np.easy : cute (but nonsense of course: some problems just remain difficult).
  • "The first massively parallel Deterministic Global Optimization solver for general non-convex MINLPs."
  • Symbolic manipulation: like some other global solvers they need the symbolic form of the problem so they can reason about this. I.e. no black-box problems.
  • Support for AMPL and Pyomo
  •  "Octeract Engine has implementations of algorithms that guarantee convergence to a global optimum in finite time. The correctness of the majority of the calculations are ensured through a combination of interval arithmetic and infinite precision arithmetic."
  • It looks like the benchmarks [3] are against itself (so always a winner)
  • I don't see any names on the web site. The About Company section is unusually vague.

Some of the competing solvers are Baron, Couenne, and Antigone.

References


  1. https://octeract.com/ 
  2. Manual: https://octeract.com/wp-content/uploads/2019/08/user_manual.pdf
  3. Benchmarks: https://octeract.com/benchmarks/

Demo problem with constraint on standard deviation

$
0
0
In [1] a hypothetical demo problem is shown. I don't think it is a real problem, but rather contrived as an example. Nevertheless, there are things to say about it.

The problem is:
Original Problem
\[\begin{align}\min\>&\sum_i \color{darkred}x_i\\ & \mathbf{sd}(\color{darkred}x) \lt \color{darkblue}\alpha\\ & \color{darkred}x_i \in \{0,1\}\end{align}\]


Notes:

  • Here sd is the standard deviation
  • We assume \(x\) has \(n\) components.
  • Of course, \(\lt\) is problematic in optimization. So the equation should become a \(\le\) constraint.
  • The standard formula for the standard deviation is: \[ \sqrt{\frac{\sum_i (x_i-\bar{x})^2}{n-1}}\] where \(\bar{x}\) is the average of \(x\).
  • This is an easy problem. Just choose \(x_i=0\).
  • When we use  \(\max \sum_i x_i\) things are equally simple. In that case choose \(x_i = 1\).
  • There is symmetry: \(\mathbf{sd}(x) = \mathbf{sd}(1-x)\).
  • A more interesting problem is to have \(\mathbf{sd}(x)\ge\alpha\).


Updated problem


A slightly different problem, and somewhat reformulated is:


MIQCP problem
\[\begin{align}\min\>&\bar{\color{darkred}x} \\ & \bar{\color{darkred}x}= \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n}\\ & \frac{\sum_i (\color{darkred}x_i - \bar{\color{darkred}x})^2}{\color{darkblue}n-1} \ge \color{darkblue}\alpha^2 \\ & \color{darkred}x_i \in \{0,1\}\end{align}\]

First, we replaced \(\lt\) by \(\ge\) to make the problem more interesting. Furthermore I got rid of the square root. This removes a possible problem with being non-differentiable at zero. The remaining problem is a non-convex quadratically constrained (MIQCP=Mixed Integer Quadratically Constrained Problem). The non-convexity implies we want a global solver.

This model solves easily with solvers like Baron or Couenne.


Integer variable


When we look at the problem a bit more, we see we are not really interested in which \(x_i\)'s are zero or one. Rather, we need only to worry about the number. Let \(k = \sum_i x_i\), Obviously \(\bar{x}=k/n\). But more interestingly: \[\mathbf{sd}(x) = \sqrt{\frac{k (1-\bar{x})^2+(n-k)(0-\bar{x})^2}{n-1}}\] The integer variable \(k\) is restricted to \(k=0,2,\dots,n\).

Thus we can write:

MINLP problem
\[\begin{align}\min\>&\color{darkred}k \\  & \frac{\color{darkred}k (1-\color{darkred}k/\color{darkblue}n)^2+(\color{darkblue}n-\color{darkred}k) (\color{darkred}k/\color{darkblue}n)^2}{\color{darkblue}n-1} \ge \color{darkblue}\alpha^2 \\ & \color{darkred}k = 0,1,\dots,\color{darkblue}n \end{align}\]

The constraint can be simplified into \[\frac{k-k^2/n}{n-1}\ge \alpha^2\] This is now so simple we can do this by enumerating \(k=0,\dots,n\), check the constraint, and pick the best.



Because of the form of the standard deviation curve (note the symmetry), we can specialize the enumeration loop and restrict the loop to \(k=1,\dots,\lfloor n/2 \rfloor\). Pick the first \(k\) that does not violate the constraint (and when found exit the loop). For very large \(n\) we can use something like a bisection to speed things up even further.

So this example optimization problem does not really need to use optimization at all.

References



  1. Constrained optimisation with function in the constraint and binary variable, https://stackoverflow.com/questions/57850149/constrained-optimisation-with-function-in-the-constraint-and-binary-variable
  2. Another problem that minimizes the standard deviation, https://yetanothermathprogrammingconsultant.blogspot.com/2017/09/minimizing-standard-deviation.html



Duplicate constraints in Pyomo model

$
0
0

Introduction


Pyomo [1]  is a popular Python based modeling tool. In [2] a question is posed about a situation where a certain constraint takes more than 8 hours to generate. As we shall see, the reason is that extra indices are used.

A simple example


The constraint \[y_i = \sum_j x_{i,j} \>\>\>\forall i,j\] is really malformed. The extra \(\forall j\) is problematic. What does this mean? One could say, this is wrong. We can also interpret this differently. Assume the inner \(j\) is scoped (i.e. local). Then we could read this as: repeat the constraint \(y_i = \sum_j x_{i,j}\), \(n\) times. Here \(n=|J|\) is the cardinality of set \(J\).

The GAMS fragment corresponding to this example, shows GAMS will object to this construct:

  11  equation e(i,j);
  12  e(i,j)..  y(i) =e= sum(j, x(i,j));
****                          $125
**** 125  Set is under control already
  13  

**** 1 ERROR(S)   0 WARNING(S)


The Pyomo equivalent can look like:

def eqRule(m,i,j):
    return m.Y[i] == sum(m.X[i,j] for j in m.J);
model.Eq = Constraint(model.I,model.J,rule=eqRule)

This fragment is a bit more difficult to read, largely due to syntactic clutter. But in any case: Python and Pyomo accepts this constraint as written. To see what is generated, we can use

model.Eq.pprint()

This will show something like:

Eq : Size=6, Index=Eq_index, Active=True
    Key          : Lower : Body                                     : Upper : Active
    ('i1', 'j1') :   0.0 : Y[i1] - (X[i1,j1] + X[i1,j2] + X[i1,j3]) :   0.0 :   True
    ('i1', 'j2') :   0.0 : Y[i1] - (X[i1,j1] + X[i1,j2] + X[i1,j3]) :   0.0 :   True
    ('i1', 'j3') :   0.0 : Y[i1] - (X[i1,j1] + X[i1,j2] + X[i1,j3]) :   0.0 :   True
    ('i2', 'j1') :   0.0 : Y[i2] - (X[i2,j1] + X[i2,j2] + X[i2,j3]) :   0.0 :   True
    ('i2', 'j2') :   0.0 : Y[i2] - (X[i2,j1] + X[i2,j2] + X[i2,j3]) :   0.0 :   True
    ('i2', 'j3') :   0.0 : Y[i2] - (X[i2,j1] + X[i2,j2] + X[i2,j3]) :   0.0 :   True

We see for each \(i\) we have three duplicates. The way to fix this is to remove the function argument \(j\) from eqRule:

def eqRule(m,i):
    return m.Y[i] == sum(m.X[i,j] for j in m.J);
model.Eq = Constraint(model.I,rule=eqRule)

After this, model.Eq.pprint() produces

Eq : Size=2, Index=I, Active=True
    Key : Lower : Body                                     : Upper : Active
     i1 :   0.0 : Y[i1] - (X[i1,j3] + X[i1,j2] + X[i1,j1]) :   0.0 :   True
     i2 :   0.0 : Y[i2] - (X[i2,j3] + X[i2,j2] + X[i2,j1]) :   0.0 :   True

This looks much better.

The original problem


The constraint in the original question was:

def period_capacity_dept(m, e, j, t, dp):
    return sum(a[e, j, dp, t]*m.y[e,j,t] for (e,j) in model.EJ)<= K[dp,t] + m.R[t,dp]
model.period_capacity_dept = Constraint(E, J, T, DP, rule=period_capacity_dept)

Using the knowledge of the previous paragraph we know this should really be:

def period_capacity_dept(m, t, dp):
    return sum(a[e, j, dp, t]*m.y[e,j,t] for (e,j) in model.EJ)<= K[dp,t] + m.R[t,dp]
model.period_capacity_dept = Constraint(T, DP, rule=period_capacity_dept)

Pyomo mixes mathematical notation with programming. I think that is one of the reasons this bug is more difficult to see. In normal programming, adding an argument to a function has an obvious meaning. However in this case, adding e,j means in effect: \(\forall e,j\). If \(e\) and \(j\) belong to large sets, we can easily create a large number of duplicates.

References


Running a MIP solver on Raspberry Pi

$
0
0

Raspberry Pi



The Raspberry Pi [1] is a small single board computer. It comes with an ARM based CPU (64 bit, quad core). You can buy it for $35 (no case included). The 4GB RAM version retailes for $55. Raspberry Pi runs some form of Linux. It is mainly used for educational purposes.


SCIP


SCIP [2] is a solver for MIP (and related) models. It is only easily available for academics, using a somewhat non-standard license. As a result, I don't see it used much outside academic circles. So it can not really be called open source.

SCIP on Raspberry Pi


In [3] SCIP is used on the Rapberry Pi with 4GB of RAM. They call it an example of "Edge Computing": bring the algorithm to where it is needed [4] (opposed to moving the data to say a server).


On average SCIP is 3 to 5 times slower on an (standard or overclocked) Rasberry Pi than on a MacBook Pro laptop.

Of course the small amount of RAM means we can only solve relatively small problems. (These days what we call a small MIP problem is actually not so small).

References


  1. https://www.raspberrypi.org/
  2. https://scip.zib.de/
  3. http://www.pokutta.com/blog/random/2019/09/29/scipberry.html
  4. https://en.wikipedia.org/wiki/Edge_computing

Scipy linear programming: a large but easy LP

$
0
0
Scipy.optimize.linprog [1] recently added a sparse interior point solver [2]. In theory we should be able to solve some larger problems with this solver. However the input format is matrix based. This makes it difficult to express LP models without much tedious programming. Of course if the LP model is very structured things are a bit easier. In [3] the question came up if we can solve some reasonable sized transportation problems with this solver. As transportation problems translate into large but easy LPs (very sparse, network structure) this would be a good example to try out.

An LP model for the transportation problem can look like:

Transportation Model
\[ \begin{align} \min \> & \sum_{i,j} \color{darkblue}c_{i,j} \color{darkred} x_{i,j} \\ & \sum_j \color{darkred} x_{i,j} \le \color{darkblue}s_i &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \ge \color{darkblue}d_j &&\forall j\\ & \color{darkred}x_{i,j}\ge 0\end{align} \]

Here \(i\) indicate the supply nodes and \(j\) the demand nodes. The problem is feasible if total demand does not exceed total supply (i.e. \(\sum_i s_i \ge \sum_j d_j\)).

Even if the transportation problem is dense (that is each supply node can serve all demand nodes or in other words each link \( i \rightarrow j\) exists), the LP matrix is sparse. There are 2 nonzeros per column.

LP Matrix


The documentation mentions we can pass on the LP matrix as a sparse matrix. Here are some estimates of the difference in memory usage:

100x100500x5001000x1000
Source Nodes1005001,000
Destination Nodes1005001,000
LP Variables10,000250,0001,000,000
LP Constraints2005002,000
LP Nonzero Elements20,000500,0002,000,000
Dense Memory Usage (MB)151,90715,258
Sparse Memory Usage (MB)0.37.630.5

For the \(1000\times 1000\) case we see that a sparse storage scheme will be about 500 times as efficient.

Solving a 1000x1000 transportation problem: Implementation


  • The package scipy.sparse [4] is used to form a sparse matrix.  
  • Scipy.optimize.linprog does not allow for \(\ge\) constraints. So our model becomes: \[\begin{align} \min &\sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j  x_{i,j} \le s_i &&\forall i \\ & \sum_i  -x_{i,j} \le  -d_j &&\forall j\\ & x_{i,j}\ge 0\end{align}\]

When I run this, I see:


Primal Feasibility  Dual Feasibility    Duality Gap         Step             Path Parameter      Objective
1.01.01.0-1.04999334.387281
0.010966102655090.010966102655040.010966102655041.00.010966102655233423127.924532
0.0074707190847310.0074707190846950.0074707190846950.33691982129820.0074707190848261045138.710249
0.0073756964397050.0073756964396690.0073756964396690.014051713781910.007375696439798946062.4541516
0.0069005237100370.0069005237100040.0069005237100040.071516119893270.006900523710125631457.8940984
0.0033926882271850.0033926882271690.0033926882271690.55427656540860.003392688227229106030.5627759
0.0027162167262180.0027162167262050.0027162167262050.22108237725460.00271621672625277660.93708537
0.001516054263280.0015160542632720.0015160542632720.47061617027720.00151605426329939012.6976106
0.0012383828831990.0012383828831930.0012383828831930.20073815298470.00123838288321531262.77924434
0.00068887637193640.0006888763719330.00068887637193310.47119554969180.000688876371945216884.5788155
0.00040453116015410.00040453116015210.00040453116015220.45045772435740.00040453116015939812.570668161
0.00032784355638580.00032784355638420.00032784355638420.20620715999360.000327843556397943.50442653
0.00019381748726020.00019381748725930.00019381748725930.43049589506030.00019381748726274718.01892459
0.00012721273362630.00012721273362570.00012721273362570.3717755628580.0001272127336283126.320160308
7.325610966318e-057.325610966282e-057.325610966283e-050.45269863331137.325610966411e-051837.061691682
6.047737643405e-056.047737643373e-056.047737643375e-050.18969427780686.047737643482e-051530.292617672
3.301112106729e-053.301112106712e-053.301112106713e-050.47584409114313.301112106771e-05870.6399411648
2.231615463384e-052.231615463375e-052.231615463374e-050.35626693880942.231615463413e-05613.0954966036
1.300693055479e-051.300693055474e-051.300693055474e-050.44376942847221.300693055496e-05388.3160007487
7.533045251385e-067.533045251368e-067.533045251357e-060.44856350948367.533045251489e-06255.9636413848
3.799832196644e-063.799832196622e-063.799832196633e-060.52646433801523.7998321967e-06165.5742065953
2.01284588862e-062.012845888624e-062.012845888615e-060.50064160283362.01284588865e-06122.2520897954
1.143491145379e-061.143491145387e-061.143491145377e-060.47122067516121.143491145397e-06101.1678772704
5.277850584407e-075.277850584393e-075.277850584402e-070.57111390274875.277850584494e-0786.20125171613
3.125695105059e-073.125695105195e-073.125695105058e-070.43159450263633.125695105113e-0780.96090171621
1.118500099738e-071.118500099884e-071.118500099743e-070.67431187431891.118500099763e-0776.06812425522
4.412565084911e-084.412565086951e-084.412565085053e-080.62972570045794.412565085131e-0874.41374033755
6.833044779903e-096.833044770544e-096.833044776856e-090.86823334535776.833044776965e-0973.50145019804
3.3755500974e-103.375549807043e-103.375549865145e-100.95283867739983.375549866004e-1073.34206256371
1.066148223577e-131.065916625724e-131.066069704785e-130.99987769873551.066069928771e-1373.3337765897
7.763476236577e-183.543282811637e-175.469419174887e-180.99995000350895.330350298476e-1873.3337739491
Optimization terminated successfully.
Current function value: 73.333774
Iterations: 30
Filename: transport.py

Line # Mem usage Increment Line Contents
================================================
5970.6 MiB 70.6 MiB @profile
60defrun():
61# dimensions
6270.6 MiB 0.0 MiB M =1000# sources
6370.6 MiB 0.0 MiB N =1000# destinations
6478.3 MiB 7.7 MiB data = GenerateData(M,N)
65108.9 MiB 30.5 MiB lpdata = FormLPData(data)
66122.6 MiB 13.7 MiB res = opt.linprog(c=np.reshape(data['c'],M*N),A_ub=lpdata['A'],b_ub=lpdata['rhs'],options={'sparse':True, 'disp':True})


This proves we can actually solve a \(1000 \times 1000\) transportation problem (leading to an LP with a million variables) using standard Python tools.

References


  1. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html
  2. https://docs.scipy.org/doc/scipy/reference/optimize.linprog-interior-point.html
  3. Maximum number of decision variables in scipy linear programming module in Python, https://stackoverflow.com/questions/57579147/maximum-number-of-decision-variables-in-scipy-linear-programming-module-in-pytho
  4. https://docs.scipy.org/doc/scipy/reference/sparse.html

The gas station problem: where to pump gas and how much

$
0
0

Problem


The problem (from [1]) is to determine where to pump gasoline (and how much) during a trip, where prices between gas stations fluctuate.


We consider some different objectives:

  • minimize cost
  • minimize number of stops
  • minimize number of stops followed by minimize cost

Data


I invented some data:


----34 SET i  locations

start , Station1 , Station2 , Station3 , Station4 , Station5 , Station6 , Station7
Station8 , Station9 , Station10, Station11, Station12, Station13, Station14, Station15
Station16, Station17, Station18, Station19, Station20, finish


----34 SET g gas stations

Station1 , Station2 , Station3 , Station4 , Station5 , Station6 , Station7 , Station8
Station9 , Station10, Station11, Station12, Station13, Station14, Station15, Station16
Station17, Station18, Station19, Station20


----34 PARAMETER efficiency =18.000 [miles/gallon]
PARAMETER capacity =50.000 tank-capacity [gallons]
PARAMETER initgas =25.000 initial amount of gasoline in tank [gallons]
PARAMETER finalgas =10.000 minimum final amount of gas in tank [gallons]
PARAMETER triplen =2000.000 length of trip [miles]

----34 PARAMETER price [$/gallon]

Station1 3.002, Station2 3.630, Station3 3.616, Station4 3.126, Station5 3.167, Station6 3.603
Station7 2.067, Station8 3.281, Station9 3.748, Station10 3.783, Station11 3.065, Station12 2.349
Station13 3.135, Station14 2.928, Station15 3.527, Station16 3.585, Station17 3.305, Station18 3.460
Station19 3.320, Station20 2.375


----34 PARAMETER distance [miles]

Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8 Station9

incr 88.074174.2273.250157.6711.847140.24779.275166.355117.030
cumul 88.074262.301265.550423.221425.068565.315644.590810.945927.975

+ Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17 Station18

incr 136.78716.51723.499167.570127.41831.52091.88591.36374.604
cumul 1064.7621081.2791104.7781272.3481399.7671431.2861523.1721614.5351689.139

+ Station19 Station20 finish

incr 148.78818.876143.198
cumul 1837.9261856.8022000.000


Prices and distances were produced using a random number generator.

Note that I added the constraint that we need a little bit left over gas in the tank when arriving at the finish. That requirement was not in the original problem [1]. We can drop this constraint by just setting the parameter \(\mathit{finalgas}=0\).

We also have some derived data: the amount of gas we use for each leg of the trip. This is just the length of the leg divided by the efficiency of the car:


----34 PARAMETER use  gas usage from previous location [gallons]

Station1 4.893, Station2 9.679, Station3 0.181, Station4 8.760, Station5 0.103, Station6 7.791
Station7 4.404, Station8 9.242, Station9 6.502, Station10 7.599, Station11 0.918, Station12 1.305
Station13 9.309, Station14 7.079, Station15 1.751, Station16 5.105, Station17 5.076, Station18 4.145
Station19 8.266, Station20 1.049, finish 7.955



Problem 1: minimize cost


The first problem is to minimize fuel cost. I have modeled this by observing three stages at each way point:

  1. First is the amount of gas in the tank when arriving at point \(i\). This amount should be non-negative: we cannot drive when the tank is empty. This variable is denoted by \(f_{\mathit{before},i}\ge 0\).
  2. The amount we pump is the second stage. This amount is bounded by \([0,\mathrm{capacity}]\). This variable is denoted by \(f_{\mathit{pumped},g}\).
  3. The amount in the tank after pumping. This amount cannot exceed the capacity of the tank. This is \(f_{\mathit{after},i} \in [0,\mathrm{capacity}]\). 

This problem is a little bit like modeling inventory: keep track of what is going out and what is added. The LP model can look like:

Min Cost Model
\[\begin{align} \min \> & \color{darkred}{\mathit{cost}}\\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \end{align}\]

Note that the set \(g\) is a subset of set \(i\): \(g\) indicates the locations with gas stations between \(\mathit{start}\) and \(\mathit{finish}\). Also note that we cannot just substitute out the variable \(f_{\mathit{before},i}\): we need to make sure this quantity is non-negative. Similarly, we cannot substitute out the variable \(f_{\mathit{after},i}\): this must obey the tank capacity bound.

The results look like:


----66 VARIABLE f.L  amounts of fuel

start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8

before 20.10721.23821.05812.29812.1964.40440.758
pumped 10.81150.000
after 25.00030.91821.23821.05812.29812.1964.40450.00040.758

+ Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17

before 34.25626.65725.74024.43440.69133.61231.86126.75621.680
pumped 25.566
after 34.25626.65725.74050.00040.69133.61231.86126.75621.680

+ Station18 Station19 Station20 finish

before 17.5359.2708.22110.000
pumped 9.735
after 17.5359.27017.955


----66 VARIABLE cost.L =218.998

We see we pump the most at station 7. Looking at the prices this makes sense: gasoline is cheapest at that gas station.

The number of stops where we pump gas is 4, and the total gas bill is $219.


Problem 2: minimize number of stops


In the previous section we solved the minimize cost problem. This gave us 4 stops to refuel with total fuel cost of $219. Now, let's try to minimize the number of times we visit a gas station. Counting in general needs binary variables, and this is no exception. The model can look like:


Min Number of Stops Model
\[\begin{align} \min \> & \color{darkred}{\mathit{numstops}}\\ & \color{darkred}{\mathit{numstops}} = \sum_g \color{darkred} \delta_g \\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g}  \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred} f_{\mathit{pumped},g} \le \color{darkred} \delta_g  \cdot \color{darkblue}{\mathit{capacity}} && \forall g \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \\ & \color{darkred} \delta_g \in \{0,1\} \end{align}\]

Because we have binary variables, this is now a MIP model. The constraint \(f_{\mathit{pumped},g} \le  \delta_g \cdot \mathit{capacity}\)  implements the implication: \[\delta_g=0 \Rightarrow f_{\mathit{pumped},g}=0\]When we solve this we see:



----71 VARIABLE f.L  amounts of fuel

start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8

before 20.10710.42810.2471.4881.38541.95432.712
pumped 6.40646.358
after 25.00020.10710.42810.2471.4887.79146.35841.95432.712

+ Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17

before 26.21118.61117.69416.3887.07941.59536.49031.415
pumped 43.346
after 26.21118.61117.69416.3887.07943.34641.59536.49031.415

+ Station18 Station19 Station20 finish

before 27.27019.00417.95510.000
after 27.27019.00417.955


----71 VARIABLE delta.L

Station5 1.000, Station6 1.000, Station14 1.000


----71 VARIABLE cost.L =314.254
VARIABLE numstops.L =3.000


So instead of 4 stops, now we only need 3 stops. We ignored the cost in this model. This causes the fuel cost to skyrocket to $314 (from $219 in the min cost model).

I kept the cost constraint in the problem for two reasons. First, it functions as an accounting constraint. Such a constraint is just for informational purposes (it is not meant to change or restrict the solution). A second reason is that we use the cost variable in a second solve in order to minimize cost while keeping the number of stops optimal. This is explained in the next section.

Problem 3: minimize number of stops followed by minimizing cost


After solving the min number of stops problem (previous section), we can fix the number of stops variable \(\mathit{numstops}\) to the optimal value and resolve minimizing the cost. This is essentially a lexicographic approach to solving the multi-objective problem min numstops, min cost. If we do this we get as solution:


----76 VARIABLE f.L  amounts of fuel

start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8

before 20.10721.23821.05812.29812.1964.40440.758
pumped 10.81150.000
after 25.00030.91821.23821.05812.29812.1964.40450.00040.758

+ Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17

before 34.25626.65725.74024.43415.1258.04641.59536.49031.415
pumped 35.301
after 34.25626.65725.74024.43415.12543.34641.59536.49031.415

+ Station18 Station19 Station20 finish

before 27.27019.00417.95510.000
after 27.27019.00417.955


----76 VARIABLE delta.L

Station1 1.000, Station7 1.000, Station14 1.000


----76 VARIABLE cost.L =239.188
VARIABLE numstops.L =3.000


Now we have a solution with 3 stops and a fuel cost of $239. This is my proposal for a solution strategy for the problem stated in [1].

An alternative would be to create a single optimization problem with a weighted sum objective: \[\min \> \mathit{numstops} + w \cdot  \mathit{cost}\] with \(w\) a small constant to make sure that \(\mathit{numstops}\) is the most important variable. As the value of \(w\) requires some thought, it may be better to use the lexicographic approach.


Filling up the gas tank


Suppose that when pumping gas we always fill up the tank completely. This alternative is not too difficult to handle. We need to add the implication: \[\delta_g=1 \Rightarrow f_{\mathit{pumped},g}=\mathit{capacity}-f_{\mathit{before},g}\] This can be handled using the inequality: \[f_{\mathit{pumped},g} \ge \delta_g \cdot \mathit{capacity}-f_{\mathit{before},g}\]


If we add this constraint and solve the min cost model we see:


----84 VARIABLE f.L  amounts of fuel

start Station1 Station2 Station3 Station4 Station5 Station6 Station7 Station8

before 20.10740.32140.14031.38131.27823.48719.08240.758
pumped 29.89330.918
after 25.00050.00040.32140.14031.38131.27823.48750.00040.758

+ Station9 Station10 Station11 Station12 Station13 Station14 Station15 Station16 Station17

before 34.25626.65725.74024.43440.69133.61248.24943.14438.068
pumped 25.56616.388
after 34.25626.65725.74050.00040.69150.00048.24943.14438.068

+ Station18 Station19 Station20 finish

before 33.92425.65824.60916.654
after 33.92425.65824.609


----84 VARIABLE delta.L

Station1 1.000, Station7 1.000, Station12 1.000, Station14 1.000


----84 VARIABLE cost.L =261.709
VARIABLE numstops.L =4.000


In this case we have a little bit more gasoline left in the tank at the finish than strictly needed. Notice how in each case we pump gas, we end up with with a full tank. This fill-up strategy is surprisingly expensive.

Conclusion


Here we see the advantages of using an optimization model compared to a tailored algorithm. We can adapt the optimization model to different situations. From the basic min cost model, we can quickly react to new questions.

References


  1. Gas Station Problem - cheapest and least amount of stations, https://stackoverflow.com/questions/58289424/gas-station-problem-cheapest-and-least-amount-of-stations
  2. Shamir Khuller, Azarakhsh Malekian, Julian Mestre, To Fill or not to Fill: The Gas Station Problem, ACM Transactions on Algorithms, Volume 7, Issue 3, July 2011.

Sometimes a commercial solver is really better...

$
0
0
Solving a model with an integer valued objective, with 500 binary variables, gave some interesting results.

  • Cplex. Optimal solution \(z=60\) found in about 8 seconds. Using 4 threads on an old laptop.
  • CBC. Hit timelimit of 2 hours. Objective = 62 (non-optimal). Also using 4 threads on the same machine. 

The strange thing is that with CBC the best possible bound is not changing at all. Not by a millimeter. See the highlighted numbers in the log below.

CBC is a very good solver, but sometimes I see things like this.


Cplex Log



Tried aggregator 2 times.
MIP Presolve eliminated 4 rows and 5 columns.
MIP Presolve modified 3 coefficients.
Aggregator did 500 substitutions.
Reduced MIP has 999 rows, 1499 columns, and 2497 nonzeros.
Reduced MIP has 500 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.02 sec. (7.38 ticks)
Found incumbent of value 500.000000 after 0.03 sec. (8.85 ticks)
Probing time = 0.00 sec. (0.20 ticks)
Tried aggregator 1 time.
Reduced MIP has 999 rows, 1499 columns, and 2497 nonzeros.
Reduced MIP has 500 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.00 sec. (5.26 ticks)
Probing time = 0.00 sec. (0.19 ticks)
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 4 threads.
Parallel mode: deterministic, using up to 3 threads for concurrent optimization.
Tried aggregator 1 time.
LP Presolve eliminated 999 rows and 1499 columns.
All rows and columns eliminated.
Presolve time = 0.02 sec. (0.71 ticks)
Initializing dual steep norms . . .
Root relaxation solution time = 0.02 sec. (1.15 ticks)

Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap

* 0+ 0 500.0000 0.0000 100.00%
Found incumbent of value 500.000000 after 0.06 sec. (16.78 ticks)
0 0 55.2664 490 500.0000 55.2664 0 88.95%
* 0+ 0 75.0000 55.2664 26.31%
Found incumbent of value 75.000000 after 0.08 sec. (21.35 ticks)
0 0 55.2825 353 75.0000 Cuts: 349 222 26.29%
0 0 55.4160 311 75.0000 Cuts: 349 455 26.11%
0 0 55.4160 275 75.0000 Cuts: 349 775 26.11%
* 0+ 0 72.0000 55.4160 23.03%
Found incumbent of value 72.000000 after 0.59 sec. (116.10 ticks)
0 0 55.4160 182 72.0000 Cuts: 349 1032 23.03%
0 0 55.4160 141 72.0000 Cuts: 349 1208 23.03%
0 0 55.4160 121 72.0000 Cuts: 192 1336 23.03%
0 0 55.4160 105 72.0000 Cuts: 128 1410 23.03%
0 0 55.4160 98 72.0000 Cuts: 85 1472 23.03%
0 0 55.4160 61 72.0000 Cuts: 74 1494 23.03%
0 0 55.4160 67 72.0000 Cuts: 160 1571 23.03%
0 2 55.4160 56 72.0000 55.4160 1571 23.03%
Elapsed time = 1.33 sec. (273.54 ticks, tree = 0.01 MB, solutions = 3)
1239 854 55.4160 59 72.0000 55.4160 3845 23.03%
Cuts: 38
* 1471+ 1230 71.0000 55.4160 21.95%
Cuts: 16
Found incumbent of value 71.000000 after 3.39 sec. (827.52 ticks)
* 1474+ 1230 70.0000 55.4160 20.83%
Found incumbent of value 70.000000 after 3.39 sec. (828.63 ticks)
* 1481+ 1230 69.0000 55.4160 19.69%
Found incumbent of value 69.000000 after 3.41 sec. (831.79 ticks)
* 1490+ 1230 68.0000 55.4160 18.51%
Found incumbent of value 68.000000 after 3.44 sec. (834.87 ticks)
* 1731+ 1033 67.0000 58.7491 12.31%
Found incumbent of value 67.000000 after 6.63 sec. (1824.17 ticks)
* 1731+ 688 65.0000 59.0933 9.09%
Found incumbent of value 65.000000 after 7.25 sec. (2079.31 ticks)
* 1731+ 458 62.0000 60.0000 3.23%
Found incumbent of value 62.000000 after 7.47 sec. (2162.68 ticks)
* 1731+ 305 61.0000 60.0000 1.64%
Found incumbent of value 61.000000 after 8.20 sec. (2495.45 ticks)
* 1731+ 0 60.0000 60.0000 0.00%
Found incumbent of value 60.000000 after 8.58 sec. (2603.40 ticks)
* 1731 0 integral 0 60.0000 60.0000 10141 0.00%
Found incumbent of value 60.000000 after 8.61 sec. (2604.50 ticks)

Cover cuts applied: 150
Implied bound cuts applied: 14
Flow cuts applied: 37
Mixed integer rounding cuts applied: 249
Gomory fractional cuts applied: 26

Root node processing (before b&c):
Real time = 1.31 sec. (273.18 ticks)
Parallel b&c, 4 threads:
Real time = 7.30 sec. (2331.56 ticks)
Sync time (average) = 0.25 sec.
Wait time (average) = 0.01 sec.
------------
Total (root+branch&cut) =
8.61 sec. (2604.75 ticks)
MIP status(101):
integer optimal solution


CBC Log



Calling CBC main solution routine...
Integer solution of 74 found by feasibility pump after 0 iterations and 0 nodes (2.78 seconds)
Integer solution of 72 found by RINS after 0 iterations and 0 nodes (2.96 seconds)
128 added rows had average density of 31.601563
At root node, 128 cuts changed objective from 55.26645 to 55.416026 in 10 passes
Cut generator 0 (Probing) - 367 row cuts average 2.1 elements, 0 column cuts (12 active) in 0.022 seconds - new frequency is 1
Cut generator 1 (Gomory) - 598 row cuts average 26.1 elements, 0 column cuts (0 active) in 0.090 seconds - new frequency is 1
Cut generator 2 (Knapsack) - 14 row cuts average 11.9 elements, 0 column cuts (0 active) in 0.037 seconds - new frequency is -100
Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active) in 0.003 seconds - new frequency is -100
Cut generator 4 (MixedIntegerRounding2) - 368 row cuts average 11.2 elements, 0 column cuts (0 active) in 0.023 seconds - new frequency is 1
Cut generator 5 (FlowCover) - 394 row cuts average 2.8 elements, 0 column cuts (0 active) in 0.041 seconds - new frequency is 1
Cut generator 6 (TwoMirCuts) - 598 row cuts average 35.5 elements, 0 column cuts (0 active) in 0.072 seconds - new frequency is -100
After 0 nodes, 1 on tree, 72 best solution,
best possible 55.416026 (3.94 seconds)
Integer solution of 70 found by heuristic after 4138 iterations and 57 nodes (14.24 seconds)
Integer solution of 69 found by heuristic after 12712 iterations and 287 nodes (23.32 seconds)
Integer solution of 66 found by heuristic after 18149 iterations and 487 nodes (25.94 seconds)
After 1005 nodes, 555 on tree, 66 best solution, best possible 55.416026 (31.73 seconds)
Integer solution of 65 found by heuristic after 32591 iterations and 1015 nodes (33.93 seconds)
Integer solution of 64 found by heuristic after 43252 iterations and 1405 nodes (39.79 seconds)
Integer solution of 63 found by heuristic after 52409 iterations and 1805 nodes (44.62 seconds)
After 2017 nodes, 1104 on tree, 63 best solution, best possible 55.416026 (45.77 seconds)
After 3045 nodes, 1653 on tree, 63 best solution, best possible 55.416026 (52.29 seconds)
After 4099 nodes, 2212 on tree, 63 best solution, best possible 55.416026 (55.02 seconds)
After 5163 nodes, 2769 on tree, 63 best solution, best possible 55.416026 (57.08 seconds)

. . .

After 131221 nodes, 37229 on tree, 63 best solution, best possible 55.416026 (610.94 seconds)
After 132282 nodes, 37230 on tree, 63 best solution, best possible 55.416026 (613.18 seconds)
After 133298 nodes, 37231 on tree, 63 best solution, best possible 55.416026 (615.50 seconds)
After 134320 nodes, 37318 on tree, 63 best solution, best possible 55.416026 (621.40 seconds)
Integer solution of 62 found by heuristic after 2839018 iterations and 134502 nodes (621.99 seconds)
After 135334 nodes, 37422 on tree, 62 best solution, best possible 55.416026 (627.07 seconds)
After 136352 nodes, 37407 on tree, 62 best solution, best possible 55.416026 (631.83 seconds)
After 137391 nodes, 37395 on tree, 62 best solution, best possible 55.416026 (637.74 seconds)
After 138400 nodes, 37385 on tree, 62 best solution, best possible 55.416026 (642.25 seconds)

. . .

After 1566320 nodes, 37408 on tree, 62 best solution, best possible 55.416026 (7177.35 seconds)
After 1567325 nodes, 37412 on tree, 62 best solution, best possible 55.416026 (7182.33 seconds)
After 1568336 nodes, 37421 on tree, 62 best solution, best possible 55.416026 (7187.07 seconds)
After 1569358 nodes, 37402 on tree, 62 best solution, best possible 55.416026 (7192.93 seconds)
After 1570410 nodes, 37400 on tree, 62 best solution, best possible 55.416026 (7199.07 seconds)
Thread 0 used 30010 times, waiting to start 2696, 0 locks, 0 locked, 0 waiting for locks
Thread 1 used 30010 times, waiting to start 2480, 0 locks, 0 locked, 0 waiting for locks
Thread 2 used 30010 times, waiting to start 2083, 0 locks, 0 locked, 0 waiting for locks
Thread 3 used 30010 times, waiting to start 1770, 0 locks, 0 locked, 0 waiting for locks
Main thread 6737 waiting for threads, 0 locks, 0 locked, 0 waiting for locks
Exiting on maximum time
Partial search -
best objective 62 (best possible 55.416026), took 19832056 iterations and 1570713 nodes (7202.49 seconds)
Strong branching done 3540964 times (6897 iterations), fathomed 137838 nodes and fixed 779663 variables
Maximum depth 198, 2057372 variables fixed on reduced cost

Time limit reached. Have feasible solution.
MIP solution: 6.200000e+01 (1570713 nodes, 7202.62 CPU seconds, 7202.62 wall clock seconds)

Rolling horizon approach for scheduling model

$
0
0
I need to run a (large) scheduling model with a planning horizon of several months: october 2019 through march 2020. This leads to a very large MIP model. One way to split this up into smaller problems is to solve it one month at the time.

Simple approach


This is easier said than done. We don't have a nice break in the schedule at the end of the month. Assignments are bleeding into the next month:

Schedule (partial view)


One way to deal with this problem is to shift the window a bit more complicated way:

Tailored rolling horizon algorithm

We basically solve the problem as a MIP two months at the time, but shift it by only one month. The binary variables more in the future are relaxed to continuous variables. That part becomes like an LP. I.e. the green parts are easy. Past binary variables are fixed. Of course, the MIP presolver will remove all fixed variables from the model, so the orange parts are also easy.

This approach only really works if we don't have global constraints over all months. The danger is that we push the bad stuff into the future. That can even lead to infeasible sub-problems at the end.  Luckily this model has no constraints that span all six months.

If we want we can solve the big model at the end using the solution we built up in parts as a starting point (using the mipstart option). If the algorithm is working as expected, this last big MIP model should not find solutions that are much better. This is indeed the case for my model. (Note that not all sub-problems are solved to optimality -- sometimes a small gap remains).


CVXPY matrix style modeling limits

$
0
0
CVXPY[1] is a popular modeling tool for convex models. It rigorously checks the model is convex which is very convenient: many convex solvers are thoroughly confused when passing on a non-convex model. This is a bit different from say passing on a non-convex model to a local NLP solver. In that case the solver will accept it and try to find local solutions.

In addition CVXPY provides many high level functions (e.g. for different norms, etc.). and a very compact matrix based modeling paradigm. Although matrix notation can be very powerful, there are limits. When overdoing things, the notation becomes less intuitive.

Example: Transportation model


In this section I'll discuss some modeling issues when implementing a simple transportation model in CVXPY, and compare this to a standard GAMS implementation.

As an example consider the standard transportation model.

Transportation Model
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\ & \sum_i\color{darkred}x_{i,j} \ge \color{darkblue} d_j && \forall j\\ &\sum_j\color{darkred}x_{i,j}\le \color{darkblue} s_i && \forall i \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\]\[\begin{align}\min\>\>&\mathbf{tr}(\color{darkblue}C^T \color{darkred}X) \\ &\color{darkred}X^T \color{darkblue}e \ge \color{darkblue} d \\ &\color{darkred}X \color{darkblue}e \le \color{darkblue}s\\ & \color{darkred}X\ge 0\end{align}\]

Here \(e\) indicates a column vector of ones of appropriate size. The model in matrix notation is identical to the equation based model on the left. The matrix based model is compacter, but arguably a bit more difficult to read for most readers (me included).

One important way to help matrix models to be more readable, is to add a summation function. As a matrix model does not have indices, we need other ways to indicate what to sum over. This is expressed as the (optional) axis argument. This means the model above can be expressed in CVXPY as


importnumpyasnp
importcvxpyascp

#----- data -------
capacity = np.array([350, 600])
demand = np.array([325, 300, 275])
distance = np.array([[2.5, 1.7, 1.8],
[2.5, 1.8, 1.4]])
freight =90
cost = freight*distance/1000

#------ set up LP data --------
C = cost
d = demand
s = capacity

#---- matrix formulation ----

ei = np.ones(s.shape)
ej = np.ones(d.shape)

X = cp.Variable(C.shape,"X")
prob = cp.Problem(
cp.Minimize(cp.trace(C.T@X)),
[X.T@ei>= d,
X@ej<= s,
X>=0])
prob.solve(verbose=True)
print("status:",prob.status)
print("objective:",prob.value)
print("levels:",X.value)

#---- summations ----

prob2 = cp.Problem(
cp.Minimize(cp.sum(cp.multiply(C,X))),
[cp.sum(X,axis=0) >= d,
cp.sum(X,axis=1) <= s,
X >=0])
prob2.solve(verbose=True)
print("status:",prob2.status)
print("objective:",prob2.value)
print("levels:",X.value)


The matrix model follows the mathematical model closely. The second model with the summations requires some explanation. In Python * is for elementwise multiplication and @ for matrix multiplication. In CVXPY however, @, * and matmul are for matrix multiplication while multiply is for elementwise multiplication. A sum without axis argument sums over all elements, while axis=0 sums over the first index, and axis=1 sums over the second index.

The data for this model is from [2]. The lack of explicit indexing has another consequence. All data is determined by its position. I.e. we need to remember that position 0 means a canning plant in Seattle or a demand market in New York,

I am not so sure about the readability of these two CVXPY models. I think I still prefer the indexed model.

CVXPY uses OSQP [3] as default LP and QP solver. Here is the log:


-----------------------------------------------------------------
OSQP v0.
6.0- Operator Splitting QP Solver
(c) Bartolomeo Stellato, Goran Banjac
University of Oxford
- Stanford University 2019
-----------------------------------------------------------------
problem: variables n =
6, constraints m = 11
nnz(P)
+ nnz(A) = 18
settings: linear system solver = qdldl,
eps_abs =
1.0e-05, eps_rel = 1.0e-05,
eps_prim_inf =
1.0e-04, eps_dual_inf = 1.0e-04,
rho =
1.00e-01 (adaptive),
sigma =
1.00e-06, alpha = 1.60, max_iter = 10000
check_termination: on (interval
25),
scaling: on, scaled_termination: off
warm start: on, polish: on, time_limit: off

iter objective pri res dua res rho time
1-2.4113e+003.32e+027.31e+001.00e-015.47e-05s
2001.5368e+022.77e-032.44e-071.23e-034.30e-02s

status: solved
solution polish:
unsuccessful
number of iterations:
200
optimal objective:
153.6751
run time:
4.31e-02s
optimal rho estimate:
2.60e-03

status: optimal
objective:
153.67514323621972
levels: [[
3.17321312e+013.00004719e+022.71849569e-03]
[
2.93267895e+02-2.77023874e-032.74995427e+02]]

The levels indicate OSQP is pretty sloppy here: we see some negative values of the order -1e-3. The real optimal objective is 153.675000 (so even though the solution is slightly infeasible, this did not help in achieving a better objective). Sometimes OSQP does better: if the solution polishing step works. This polishing is a bit like a poor man's crossover: it tries to guess the active constraints. In this case polishing did not work.

The number of iterations is large. This is normal: this solver uses a first order algorithm. They typically require a lot of iterations.


For completeness, the original GAMS model looks like:


Set
i 'canning plants' / seattle, san-diego /
j 'markets' / new-york, chicago, topeka /;

Parameter
a(i) 'capacity of plant i in cases'
/ seattle 350
san-diego 600 /

b(j) 'demand at market j in cases'
/ new-york 325
chicago 300
topeka 275 /;

Table d(i,j) 'distance in thousands of miles'
new-york chicago topeka
seattle 2.51.71.8
san-diego 2.51.81.4;

Scalar f 'freight in dollars per case per thousand miles' / 90 /;

Parameter c(i,j) 'transport cost in thousands of dollars per case';
c(i,j) = f*d(i,j)/1000;

Variable
x(i,j) 'shipment quantities in cases'
z 'total transportation costs in thousands of dollars';

Positive Variable x;

Equation
cost 'define objective function'
supply(i) 'observe supply limit at plant i'
demand(j) 'satisfy demand at market j';

cost.. z =e= sum((i,j), c(i,j)*x(i,j));

supply(i).. sum(j, x(i,j)) =l= a(i);

demand(j).. sum(i, x(i,j)) =g= b(j);

Model transport / all /;

solve transport using lp minimizing z;

display x.l, x.m;


The main differences with the CVXPY model are:

  • GAMS indexes by names (set elements), CVXPY uses positions in a matrix or vector
  • GAMS is equation based while CVXPY uses matrices
  • For this model, the CVXPY representation is be very compact, terse but depending on the formulation it requires familiarity with matrix notation. 
  • Arguably, the most important feature of a modeling tool is readability. I have a preference to the GAMS notation here: it is closer to the original mathematical model in a notation I am used to.
  • When printing the results, GAMS is a bit more intuitive:  


GAMS:

---- 66 VARIABLE x.L shipment quantities in cases

new-york chicago topeka

seattle 50.000300.000
san-diego 275.000275.000


Python:
[[ 3.17321312e+013.00004719e+022.71849569e-03]
[ 2.93267895e+02 -2.77023874e-032.74995427e+02]]

Example: non-convex binary quadratic optimization


Consider the binary quadratic programming problem (in indexed format and in matrix notation):

Binary Quadratic Model
\[\begin{align}\min&\sum_{i,j} \color{darkred}x_i \color{darkblue}q_{i,j}\color{darkred}x_j\\ & \color{darkred}x_i \in \{0,1\} \end{align}\]\[\begin{align}\min\>\>& \color{darkred}x^T\color{darkblue}Q\color{darkred}x \\ &\color{darkred}x \in \{0,1\} \end{align}\]


We don't assume the Q matrix is positive (semi) definite or even symmetric. This makes the problem non-convex. Interestingly, when we feed this problem into solvers like Cplex or Gurobi, they have no problem in finding the optimal solution. The reason is that they apply a trick to make this problem linear.

We can linearize the binary product \(y_{i,j} = x_i x_j\) by \[\begin{align} & y_{i,j} \le x_i \\& y_{i,j} \le x_j \\& y_{i,j} \ge x_i + x_j -1 \\ & x_i, y_{i,j} \in \{0,1\}\end{align}\] If we want we can relax \(y\) to be continuous between 0 and 1.

After applying this linearization, we have:

Linearized Binary Quadratic Model
\[\begin{align}\min&\sum_{i,j} \color{darkblue}q_{i,j}\color{darkred}y_{i,j}\\ & \color{darkred}y_{i,j} \le \color{darkred}x_i\\ & \color{darkred}y_{i,j} \le \color{darkred}x_j\\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i+ \color{darkred}x_j - 1\\ & \color{darkred}x_i,\color{darkred}y_{i,j} \in \{0,1\} \end{align}\]\[\begin{align}\min\>\>& \mathbf{tr}(\color{darkblue}Q^T\color{darkred}Y) \\ & \color{darkred}Y \le \color{darkred}x \cdot\color{darkblue}e^T \\ & \color{darkred}Y \le \color{darkblue}e \cdot\color{darkred}x^T \\& \color{darkred}Y \ge \color{darkred}x \cdot\color{darkblue}e^T + \color{darkblue}e \cdot\color{darkred}x^T - \color{darkblue}e \cdot \color{darkblue}e^T\\ &\color{darkred}x, \color{darkred}Y \in \{0,1\} \end{align}\]

The matrix form of the objective is similar to the one we saw in the section on the transportation problem. The constraints are a little bit more complicated due to the outer products.


Although a solver like Cplex and Gurobi can solve the quadratic formulation directly, CVXPY will complain with the message:


cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically:
The objective is not DCP. Its following subexpressions are not:x * [[-6.56505736e+006.86533416e+001.00750712e+00 -3.97724192e+00
-4.15575766e+00 -5.51894266e+00 -3.00338992e+007.12540694e+00
-8.65772554e+004.21338000e-03]
[ 9.96235254e+001.57466756e+009.82266078e+005.24500934e+00
-7.38615034e+002.79437518e+00 -6.80964272e+00 -4.99838934e+00
3.37857218e+00 -1.29287238e+00]
[-2.80599468e+00 -2.97117264e+00 -7.37016820e+00 -6.99796424e+00
1.78227300e+006.61785624e+00 -5.38368524e+003.31468920e+00
5.51715212e+00 -3.92683046e+00]
[-7.79015418e+004.76973200e-02 -6.79654476e+007.44924622e+00
-4.69770910e+00 -4.28371356e+001.87911844e+004.45438142e+00
2.56497354e+00 -7.24042700e-01]
[-1.73386012e+00 -7.64609286e+00 -3.71575466e+00 -9.06896972e+00
-3.22899456e+00 -6.35800814e+002.91454254e+001.21491094e+00
5.39923440e+00 -4.04388272e+00]
[ 3.22212522e+005.11643348e+002.54894998e+00 -4.32271604e+00
-8.27150752e+00 -7.94970662e+002.82502302e+009.06189960e-01
-9.36950296e+005.84721284e+00]
[-8.54466004e+00 -6.48677902e+005.12652260e-015.00415338e+00
-6.43752572e+00 -9.31718028e+001.70262346e+002.42459968e+00
-2.21276200e+00 -2.82571694e+00]
[-5.13930766e+00 -5.07156922e+00 -7.38994394e+008.66899440e+00
-2.40124188e+005.66800922e+00 -3.99931484e+00 -7.49033556e+00
4.97748210e+00 -8.61535074e+00]
[-5.95968886e+00 -9.89868284e+00 -4.60773896e+00 -2.97050000e-03
-6.97428262e+00 -6.51661090e+00 -3.38724532e+00 -3.66187892e+00
-3.55826090e+009.27953282e+00]
[ 9.87204410e+00 -2.60193890e+00 -2.54222866e+005.43956660e+00
-2.06631716e+008.26192650e+00 -7.60844540e+004.70957778e+00
-8.89163050e+001.52599610e+00]] * x


A linearized formulation can look like:


import numpy as np
import cvxpy as cp


# -------- data ---------


Q = np.array([
[-6.56505736, 6.86533416, 1.00750712, -3.97724192, -4.15575766, -5.51894266, -3.00338992, 7.12540694, -8.65772554, 0.00421338],
[ 9.96235254, 1.57466756, 9.82266078, 5.24500934, -7.38615034, 2.79437518, -6.80964272, -4.99838934, 3.37857218, -1.29287238],
[-2.80599468, -2.97117264, -7.3701682 , -6.99796424, 1.782273 , 6.61785624, -5.38368524, 3.3146892 , 5.51715212, -3.92683046],
[-7.79015418, 0.04769732, -6.79654476, 7.44924622, -4.6977091 , -4.28371356, 1.87911844, 4.45438142, 2.56497354, -0.7240427 ],
[-1.73386012, -7.64609286, -3.71575466, -9.06896972, -3.22899456, -6.35800814, 2.91454254, 1.21491094, 5.3992344 , -4.04388272],
[ 3.22212522, 5.11643348, 2.54894998, -4.32271604, -8.27150752, -7.94970662, 2.82502302, 0.90618996, -9.36950296, 5.84721284],
[-8.54466004, -6.48677902, 0.51265226, 5.00415338, -6.43752572, -9.31718028, 1.70262346, 2.42459968, -2.212762 , -2.82571694],
[-5.13930766, -5.07156922, -7.38994394, 8.6689944 , -2.40124188, 5.66800922, -3.99931484, -7.49033556, 4.9774821 , -8.61535074],
[-5.95968886, -9.89868284, -4.60773896, -0.0029705 , -6.97428262, -6.5166109 , -3.38724532, -3.66187892, -3.5582609 , 9.27953282],
[ 9.8720441 , -2.6019389 , -2.54222866, 5.4395666 , -2.06631716, 8.2619265 , -7.6084454 , 4.70957778, -8.8916305 , 1.5259961 ]])


n = Q.shape[0]


# ---- linearized model, matrix format -----

x = cp.Variable((n,1),"x",boolean=True)
Y = cp.Variable((n,n),"Y")
e = np.ones((n,1))


prob = cp.Problem(cp.Minimize(cp.trace(Q.T@Y)),
[Y <= x@e.T,
Y <= e@x.T,
Y >= x@e.T + e@x.T - e@e.T,
Y >= 0,
Y <= 1])
prob.solve(solver=cp.GLPK_MI,verbose=True)
print("status:",prob.status)
print("objective:",prob.value)
print("levels:",x.value)


Notes:

  • The objective can be replaced by cp.Minimize(cp.multiply(Q,Y))
  • I relaxed the Y variables
  • We use glpk as the MIP solver
  • CVXPY comes with an integer solver called ECOS_BB. This solver seems to choke on this problem.


The original GAMS model for this problem was as follows:


set i /i1*i10/;
alias(i,j);

parameter q(i,j);
q(i,j) = uniform(-10,10);

binary variable x(i);
variable z;

equation obj;

obj.. z =e= sum((i,j), x(i)*q(i,j)*x(j));

model m /obj/;
option miqcp=cplex,optcr=0;
solve m minimizing z using miqcp;
display z.l,x.l;


Cplex will automatically linearize this model.

CVXPY Sparse Variables


CVXPY has some severe limitations on how variables can look like.

First we cannot use three (or more) dimensional variables. So something like x[i,j,k] is not supported. A declaration like:

X = cp.Variable((n,n,n),"X")

gives:

     ValueError: Expressions of dimension greater than 2 are not supported.

This is a rather severe restriction. Many practical model have variables exceeding 2 dimensions. Of course matrix notation becomes impractical for symbols with more than 2 dimensions. Which is probably the reason why CVXPY ony wants to handle scalars, vectors and matrices.

Furthermore sparse variables are not supported either. Everything is fully allocated. As an example consider the toy model:



n =100
X = cp.Variable((n,n),"X")
prob = cp.Problem(
cp.Minimize(0),
[X[0,0]==1])


Solver Log:
problem: variables n =10000, constraints m =1


My hope was that I could just declare a large variable matrix and that CVXPY would only export the used variables to the solver. Instead of the solver seeing a model with one variable, it receives a model with 10,000 variables.


Example: a Sparse Network Model


A linear programming formulation for a max-flow network problem can look like:

Max-flow Sparse Network Model
\[\begin{align}\max\>\>&\color{darkred}f\\ & \sum_{j|\color{darkblue}{\mathit{arc}}(j,i)}\color{darkred}x_{j,i} = \sum_{j|\color{darkblue}{\mathit{arc}}(i,j)}\color{darkred}x_{i,j} + \color{darkred}f\cdot \color{darkblue}b_i && \forall i \\ & 0 \le \color{darkred}x_{i,j} \le \color{darkblue}{\mathit{capacity}_i} && \forall i,j|\color{darkblue}{\mathit{arc}}(i,j) \end{align}\]

Here \(arc(i,j)\) indicates whether link \(i \rightarrow j\) exists. The sparse data vector \(b_i\) is defined by \[b_i = \begin{cases} -1 & \text{if node $i$ is the source node}\\ +1 & \text{if node $i$ is the sink node}\\ 0 & \text{otherwise} \end{cases}\]

This mathematical model translates directly into a GAMS model:


$ontext

max flow network example

Data from example in
Mitsuo Gen, Runwei Cheng, Lin Lin
Network Models and Optimization: Multiobjective Genetic Algorithm Approach
Springer, 2008

Erwin Kalvelagen, Amsterdam Optimization, May 2008

$offtext


sets
i 'nodes' /node1*node11/
source(i) /node1/
sink(i) /node11/
;

alias(i,j);

parameter capacity(i,j) /
node1.node2 60
node1.node3 60
node1.node4 60
node2.node3 30
node2.node5 40
node2.node6 30
node3.node4 30
node3.node6 50
node3.node7 30
node4.node7 40
node5.node8 60
node6.node5 20
node6.node8 30
node6.node9 40
node6.node10 30
node7.node6 20
node7.node10 40
node8.node9 30
node8.node11 60
node9.node10 30
node9.node11 50
node10.node11 50
/;



set arcs(i,j);
arcs(i,j)$capacity(i,j) = yes;
display arcs;

parameter rhs(i);
rhs(source) = -1;
rhs(sink) = 1;

variables
x(i,j) 'flow along arcs'
f 'total flow'
;

positive variables x;
x.up(i,j) = capacity(i,j);

equations
flowbal(i) 'flow balance'
;

flowbal(i).. sum(arcs(j,i), x(j,i)) - sum(arcs(i,j), x(i,j)) =e= f*rhs(i);

model m /flowbal/;

solve m maximizing f using lp;


The GAMS model exploits that GAMS stores data sparsely. The variables x(i,j) are only allocated when they are used inside the equations. This usage is restricted to cases where arcs(i,j) exist. I.e. the number of variables x(i,j) is 22 instead of  \(11 \times 11\).

As we discussed in the previous section, CVXPY does not support sparse variables like GAMS. So instead of variables x(i,j) we'll use x[k] where k indicates the arc number. CVXPY supports sparse data matrices through scipy.sparse. In the code below we set up a sparse matrix A with entries as follows:

  • A[i,k] = -1 if arc \(k\) represents an outgoing link \(i \rightarrow j\)
  • A[i,k] = +1 if arc \(k\) represents an incoming link \(j \rightarrow i\)
With this we can formulate the model:



import numpy as np
import scipy.sparse as sparse
import cvxpy as cp


# ------ data --------
data = {
'nodes':['A','B','C','D','E','F','G','H','I','J','K'],
'from':['A','A','A','B','B','B','C','C','C','D','E',
'F','F','F','F','G','G','H','H','I','I','J'],
'to': ['B','C','D','C','E','F','D','F','G','G','H',
'E','H','I','J','F','J','I','K','J','K','K'],
'capacity': [60,60,60,30,40,30,30,50,30,40,60,20,30,40,30,20,40,30,60,30,50,50],
'source' : 'A',
'sink' : 'K'
}

numnodes = len(data['nodes'])
numarcs = len(data['capacity'])

print("Number of nodes: {}".format(numnodes))
print("Number of arcs: {}".format(numarcs))

# ------ lp data --------

# map node name to index
map = dict(zip(data['nodes'],range(numnodes)))

# coefficients
irow = np.zeros(2*numarcs,int)
jcol = np.zeros(2*numarcs,int)
val = np.zeros(2*numarcs)

# arc k: i->j has coefficient -1 in row i, column k
# +1 j
for k in range(numarcs):
i = map[data['from'][k]]
j = map[data['to'][k]]
kk = 2*k
irow[kk] = i
jcol[kk] = k
val[kk] = -1
kk = 2*k+1
irow[kk] = j
jcol[kk] = k
val[kk] = 1

A = sparse.csc_matrix((val,(irow,jcol)))

b = np.zeros(numnodes)
b[map[data['source']]] = -1
b[map[data['sink']]] = 1

cap = data['capacity']

# ------ lp model --------

x = cp.Variable(numarcs,"x")
f = cp.Variable(1,"f")

prob = cp.Problem(cp.Maximize(f),
[A@x == f*b, x >= 0, x <= cap])
prob.solve(verbose=True)
print(prob)


With this experiment, we have confirmed that sparse data matrices work just fine with CVXPY.  However, this makes the model not that straightforward anymore. This is more complicated than the corresponding GAMS model.

See [4] for an alternative approach.

Example: Matrix Balancing


In this example we want to estimate the inner part of a matrix subject to row- and column-total constraints. This problem is frequently encountered in economic modeling. An additional constraint is that we want to maintain the sparsity pattern of the matrix. Basically the model is:

Matrix Balancing Model
\[\begin{align}\min\>\>&\mathbf{dist}(\color{darkred}A,\color{darkblue}A^0)\\ & \sum_i\color{darkred}a_{i,j} = \color{darkblue} v_j && \forall j\\ &\sum_j\color{darkred}a_{i,j} = \color{darkblue} u_i && \forall i \\ & \color{darkblue}a^0_{i,j}=0 \Rightarrow\color{darkred}a_{i,j} = 0 \end{align}\]

There are different possibilities for the distance function. E.g. it can be a quadratic function \[\mathbf{dist}(A,A^0) = \sum_{i,j} (a_{i,j}-a^0_{i,j})^2\] or in this case an entropy function \[\mathbf{dist}(A,A^0) =\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right)\]
Often the implication is enforced by just ignoring or skipping all elements \(a_{i,j}\) where \(a^0_{i,j}=0\). This leads again to a sparse representation of the variables. In GAMS this is quite easy.


$ontext

Example from

Using PROC IML to do Matrix Balancing
Carol Alderman, University of Kansas
Institute for Public Policy and Business Research
MidWest SAS Users Group MWSUG 1992

$offtext


sets
p 'products' /pA*pI/
s 'salesmen' /s1*s10/
;

table A0(*,*) 'estimated matrix, known totals'

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 rowTotal
pA 230375375100685215502029
pB 330405419175905045152401052798
pC 26822524230790301441001998
pD 59538063827530685605881001603566
pE 34036044020030755475441502794
pF 1321902004321301071
pG 30933035012561247450502305
pH 36540033015050575600441501102747
pI 210250308125720256100502015

colTotal 277229103300115024057603526220950495
;

alias (p,i);
alias (s,j);

variables
A(i,j) 'new values'
z 'objective (minimized)'
;

equations
objective
rowsum(i)
colsum(j)
;

objective.. z =e= sum((i,j)$A0(i,j), A(i,j)*log(A(i,j)/A0(i,j)));
rowsum(i).. sum(j$A0(i,j), A(i,j)) =e= A0(i,'rowTotal');
colsum(j).. sum(i$A0(i,j), A(i,j)) =e= A0('colTotal',j);

A.L(i,j) = A0(i,j);
A.lo(i,j)$A0(i,j) = 0.0001;

model m /all/;
solve m minimizing z using nlp;

display A.L,z.l;

When solved with the general purpose NLP solver CONOPT, we get the following results:


----     58 VARIABLE A.L  new values

s1 s2 s3 s4 s5 s6 s7 s8 s9

pA 229.652374.869375.099100.028686.238212.54250.572
pB 330.587406.192420.492175.62694.300506.575510.790243.545
pC 267.313224.685241.80931.297790.595297.24644.017101.037
pD 595.262380.610639.416275.61431.391687.580599.25288.299101.341
pE 339.659360.058440.341200.15831.346756.750469.80944.086151.793
pF 130.330187.815197.821427.953127.080
pG 309.478330.895351.165125.418614.984470.01650.727
pH 360.594395.631326.596148.45551.665569.947586.86743.598150.111
pI 209.123249.246307.260124.701719.378252.398100.874

+ s10

pB 109.894
pD 167.233
pG 52.318
pH 113.535
pI 52.019


---- 58 VARIABLE z.L = -15.769 objective (minimized)


CVXPY does not allow to skip elements like GAMS does. Well, unless we build the whole model in scalar mode: element by element. That is not very attractive so let's try another way. It will be a bit of a struggle. Let's define a (data) matrix D as \[d_{i,j} = \begin{cases} 1 & \text{if $a^0_{i,j} \ne 0$}\\ 0 & \text{if $a^0_{i,j} = 0$}\end{cases}\] This matrix can be used to skip linear terms that are not needed. The objective is another story. CVXPY has the function entr() which is defined by: \(-x\log(x)\). We expand the objective as: \[\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right) = \sum_{i,j} - \mathbf{entr}(a_{i,j})  - a_{i,j}\log(a^0_{i,j})\] Finally we insert \(d\) to ignore the zero's: \[ \min \sum_{i,j} - d_{i,j} \mathbf{entr}(a_{i,j}) - d_{i,j} a_{i,j}\log(a^0_{i,j}+1-d_{i,j})\] This is quite some gymnastics to shoehorn our model into an acceptable CVXPY format.



import numpy as np
import cvxpy as cp

# -------- data ----------
A0 = [[ 230 , 375 , 375 , 100 , 0 , 685 , 215 , 0 , 50 , 0 ],
[ 330 , 405 , 419 , 175 , 90 , 504 , 515 , 0 , 240 , 105 ],
[ 268 , 225 , 242 , 0 , 30 , 790 , 301 , 44 , 100 , 0 ],
[ 595 , 380 , 638 , 275 , 30 , 685 , 605 , 88 , 100 , 160 ],
[ 340 , 360 , 440 , 200 , 30 , 755 , 475 , 44 , 150 , 0 ],
[ 132 , 190 , 200 , 0 , 0 , 432 , 130 , 0 , 0 , 0 ],
[ 309 , 330 , 350 , 125 , 0 , 612 , 474 , 0 , 50 , 50 ],
[ 365 , 400 , 330 , 150 , 50 , 575 , 600 , 44 , 150 , 110 ],
[ 210 , 250 , 308 , 125 , 0 , 720 , 256 , 0 , 100 , 50 ]]

u = [2029,2798,1998,3566,2794,1071,2305,2747,2015]
v = [2772,2910,3300,1150,240,5760,3526,220,950,495]

m = len(u)
n = len(v)

# -------- model ----------

D = np.sign(A0)

Dloga0 = D * np.log(A0+np.ones_like(D)-D)

A = cp.Variable((m,n),"A")

obj = cp.Minimize(cp.sum(cp.multiply(D,-cp.entr(A)) - cp.multiply(A,Dloga0)))
cons = [cp.sum(cp.multiply(D,A),axis=1)==u,
cp.sum(cp.multiply(D,A),axis=0)==v,
A >= 0]
prob = cp.Problem(obj,cons)
prob.solve(solver=cp.SCS,verbose=True,max_iters=200000)


Solving this tiny model is very difficult. It is using exponential cones and that is not very robustly implemented in the solvers. With SCS and a lot of iterations, we finally see:



101400| 7.74e-059.08e-057.16e-04 -1.56e+01 -1.56e+013.30e-111.14e+02
101500| 7.72e-059.10e-055.74e-04 -1.54e+01 -1.54e+013.30e-111.14e+02
101600| 7.69e-059.13e-055.30e-04 -1.51e+01 -1.51e+017.77e-111.14e+02
101700| 7.66e-059.17e-054.02e-04 -1.49e+01 -1.49e+015.64e-111.14e+02
101800| 7.62e-059.22e-053.96e-04 -1.46e+01 -1.46e+013.30e-111.14e+02
101900| 7.59e-059.28e-052.10e-04 -1.44e+01 -1.44e+013.30e-111.14e+02
101980| 7.56e-059.33e-053.35e-05 -1.42e+01 -1.42e+011.17e-111.14e+02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.14e+02s
Lin-sys: nnz in L factor: 1089, avg solve time: 1.87e-05s
Cones: avg projection time: 1.06e-03s
Acceleration: avg step time: 2.62e-07s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.8795e-06, dist(y, K*) = 0.0000e+00, s'y/|s||y| = -5.3311e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 7.5554e-05
dual res: |A'y + c|_2 / (1 + |c|_2) = 9.3272e-05
rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 3.3548e-05
----------------------------------------------------------------------------
c'x = -14.2093, -b'y = -14.2103
============================================================================

The objective is in the neighborhood of what CONOPT found for the GAMS model (CONOPT required just a few iterations).

This model is not exactly a showcase for CVXPY.

Conclusion


CVXPY can be very convenient to model certain classes of models: convex and easy too write in matrix notation. For other models it is not very suited. We saw some examples where things become really hairy when using CVXPY while the underlying mathematical model is really quite simple. CVXPY is really a special purpose modeling tool and you may want to consider other tools when the model does not really fits CVXPY's matrix philosophy.

References


  1. https://www.cvxpy.org/
  2. https://www.gams.com/products/simple-example/
  3. https://osqp.org/
  4. https://www.cvxpy.org/examples/applications/OOCO.html

Interesting constraint

$
0
0
In [1] the following problem is proposed:

I'm trying to solve a knapsack-style optimization problem with additional complexity.
Here is a simple example. I'm trying to select 5 items that maximize value. 2 of the items must be orange, one must be blue, one must be yellow, and one must be red. This is straightforward. However, I want to add a constraint that the selected yellow, red, and orange items can only have one shape in common with the selected blue item.

The example data looks like:


 item   color     shape  value
A blue circle 0.454
B yellow square 0.570
C red triangle 0.789
D red circle 0.718
E red square 0.828
F orange square 0.709
G blue circle 0.696
H orange square 0.285
I orange square 0.698
J orange triangle 0.861
K blue triangle 0.658
L yellow circle 0.819
M blue square 0.352
N orange circle 0.883
O yellow triangle 0.755


Let's see if we can model this. First we slice and dice the data a bit to make the modeling a bit easier. Here is some derived data:


----     58 SET i  item

A, B, C, D, E, F, G, H, I, J, K, L, M, N, O


---- 58 SET c color

blue , yellow, red , orange


---- 58 SET s shape

circle , square , triangle


---- 58 SET ICS(i,c,s)

circle square triangle

A.blue YES
B.yellow YES
C.red YES
D.red YES
E.red YES
F.orange YES
G.blue YES
H.orange YES
I.orange YES
J.orange YES
K.blue YES
L.yellow YES
M.blue YES
N.orange YES
O.yellow YES


---- 58 SET IC(i,c)

blue yellow red orange

A YES
B YES
C YES
D YES
E YES
F YES
G YES
H YES
I YES
J YES
K YES
L YES
M YES
N YES
O YES


---- 58 SET CS(c,s)

circle square triangle

blue YES YES YES
yellow YES YES YES
red YES YES YES
orange YES YES YES


---- 58 PARAMETER value(i): value of item

A 0.454, B 0.570, C 0.789, D 0.718, E 0.828, F 0.709, G 0.696, H 0.285, I 0.698, J 0.861
K 0.658, L 0.819, M 0.352, N 0.883, O 0.755


---- 58 SET YRO(c): excludes blue

yellow, red , orange


Note that the set CS(c,s) is complete. However, I will assume that there is a possibility that this set has some missing entries. In other words, I will not assume that all combinations of colors and shapes exist in the data.

Let's introduce the following zero-one variables:\[\begin{align} & x_i = \begin{cases} 1 & \text{if item $i$ is selected}\\ 0 & \text{otherwise}\end{cases} \\ & y_{c,s} = \begin{cases} 1 & \text{if items with color $c$ and shape $s$ are selected}\\ 0 & \text{otherwise} \end{cases}\end{align}\]


My high-level model is:

High-level Model
\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i | \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} = \max_{i|\color{darkblue}{\mathit{ICS}}(i,c,s)} \color{darkred}x_i && \forall c,s|\color{darkblue}{\mathit{CS}}(c,s)\\  & \color{darkred}y_{\color{darkblue}{\mathit{blue}},s} = 1 \Rightarrow \sum_{c|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

The max constraint implements the definition of the \(y_{c,s}\) variables: if any of the selected items has color/shape combination \((c,s)\)  then \(y_{c,s}=1\) (and else it stays zero). The implication constraint says: if we have a blue shape \(s\), then there can be only one shape of type \(s\) of other color. The model is a bit complicated because I wanted to be precise. No hand-waving. This helps when implementing it.

This is not yet a MIP model, but translation of the above model into a normal MIP is not too difficult.


Mixed Integer Programming Model
\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i | \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} \ge \color{darkred}x_i && \forall i,c,s|\color{darkblue}{\mathit{ICS}}(i,c,s)\\  & \sum_{c|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 + \color{darkblue}M(1-\color{darkred}y_{\color{darkblue}{\mathit{blue}},s}) && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

The max constraint has been replaced by a single inequality. This works in this case as we are only interested in \(y\)'s that are forced to be one. Those guys will limit blue shape constraint. The blue shape implication constraint itself is rewritten as a big-M inequality. A good value for \(M\) is not very difficult to establish: the number of colors minus blue minus 1.

For comparison, let's first run the model without the complex "blue" shape constraints (the last two constraints). That gives:


----     90 VARIABLE z.L                   =        4.087  obj

----
90 VARIABLE x.L select item

E
1.000, G 1.000, J 1.000, L 1.000, N 1.000


----
90 SET selected

circle square triangle

E.red YES
G.blue
YES
J.orange YES
L.yellow
YES
N.orange
YES

We see that the blue shape (circle) is also selected as orange and yellow.

With the additional constraints, we see:


----     95 VARIABLE z.L                   =        4.049  obj

----
95 VARIABLE x.L select item

E
1.000, J 1.000, K 1.000, L 1.000, N 1.000


----
95 VARIABLE y.L color/shape combos in solution (bound)

circle square triangle

blue
1.000
yellow
1.000
red
1.000
orange
1.0001.000


----
95 SET selected

circle square triangle

E.red YES
J.orange
YES
K.blue
YES
L.yellow YES
N.orange YES

Now we see the blue shape is a triangle. We only have another triangle of color orange.

The complete GAMS model looks like:

$ontext

  
I'm trying to solve a knapsack-style optimization problem with additional complexity.

  
Here is a simple example. I'm trying to select 5 items that maximize value. 2 of the
  
items must be orange, one must be blue, one must be yellow, and one must be red.
  
This is straightforward. However, I want to add a constraint that the selected yellow,
  
red, and orange items can only have one shape in common with the selected blue item.

$offtext


set
   i
'item'/A*O/
   c
'color'/blue,yellow,red,orange/
   s
'shape'/circle,square,triangle/
;

parameters
    data(i,c,s)
'value'/
        
A . blue   .  circle     0.454
        
B . yellow .  square     0.570
        
C . red    .  triangle   0.789
        
D . red    .  circle     0.718
        
E . red    .  square     0.828
        
F . orange .  square     0.709
        
G . blue   .  circle     0.696
        
H . orange .  square     0.285
        
I . orange .  square     0.698
        
J . orange .  triangle   0.861
        
K . blue   .  triangle   0.658
        
L . yellow .  circle     0.819
        
M . blue   .  square     0.352
        
N . orange .  circle     0.883
        
O . yellow .  triangle   0.755
        
/
    NumItems
'number of items to select'/5/
    NumColor(c)
'required number of each color'/
        
orange 2
        
red    1
        
blue   1
        
yellow 1
       
/
;

sets
   YRO(c)
'(c): excludes blue'/yellow,red,orange/
   ICS(i,c,s) 
"(i,c,s)"
   IC(i,c)    
"(i,c)"
   CS(c,s)     "(c,s)"
;
parameter value(i) "(i): value of item";
ICS(i,c,s) = data(i,c,s);
IC(i,c) =
sum(ICS(i,c,s),1);
CS(c,s) =
sum(ICS(i,c,s),1);
value(i) =
sum((c,s),data(i,c,s));
display i,c,s,ICS,IC,CS,value,YRO;

binaryvariable x(i) 'select item';
variable z 'obj';
binaryvariable y(c,s) 'color/shape combos in solution (bound)';

equations
   obj      
'objective'
   count    
'count number of selected items'
   countcolor(c) 
'count selected items for each color'
   shapecol(i,c,s)
'bound on y(c,s)'
   impl(s)  
'rewritten implication'
;
obj.. z =e=
sum(i, value(i)*x(i));
count..
sum(i, x(i)) =e= numitems;
countcolor(c)..
sum(IC(i,c), x(i)) =e= numcolor(c);
shapecol(ICS(i,c,s)).. y(c,s) =g= x(i);


scalar M;
M =
card(s)-2;

impl(s)..
sum(yro,y(yro,s)) =l= 1 + M*(1-y("blue",s));

set selected(i,c,s);

option optcr=0;
model m1 /obj,count,countcolor/;
solve m1 maximizing z using mip;
selected(i,c,s)$ICS(i,c,s) = x.l(i)>0.5;
display z.l,x.l,selected;

model m2 /all/;
solve m2 maximizing z using mip;
selected(i,c,s)$ICS(i,c,s) = x.l(i)>0.5;
display z.l,x.l,y.l,selected;





A second exercise would be to write this Python using PuLP.

References


MIP solver stopping criteria

$
0
0
For larger MIP models we often don't wait for a proven optimal solution. This just takes too long, and actually we are spending a lot of time in proving optimality without much return in terms of better solutions. There are a number of stopping criteria that are typically available:

  1. Time Limit : stop after \(x\) seconds (or hours)
  2. Relative Gap: stop if gap between best possible bound and best found integer solution becomes less than \(x\%\).  Different solvers use different definitions (especially regarding the denominator).
  3. Absolute Gap: similar to relative gap, but can be used when the relative gap cannot be computed (division by zero or small number).
  4. Node Limit: stop on number of explored branch & bound nodes.
  5. Iteration Limit: stop on number of Simplex iterations. This number can be huge.

I have ordered these stopping criteria in how useful they are (to me). Time limit is by far the most important: just tell the solver how long we are willing to wait. Stopping on an achieved gap is also useful. I don't remember ever using a node or iteration limit.

If you specify several limits, typically a solver will stop as soon as it hits any one of the specified limits. In other words: multiple stopping criteria are combined in an "or" fashion.

When stopping on a time limit, it is still important to inspect the final gap. A small gap gives us a guaranteed quality assurance about the solution.




For large models the tail is often very long, and we probably see hardly any movement: no new integer solutions are found and the best bound is moving very slowly (and moving less over time). So I really want to stop if there is not much hope for a better solution.

I would suggest another possible stopping criterion:

stop if the time since the last new (and improving) integer solution exceeds a time limit

If the time since that last new integer solution is large, we can infer that the probability of finding a better solution is small. We can also interpret this as resetting the clock after each new integer solution. I don't think any solver has this. Of course, for some solvers, we can implement this ourselves using some callback function.

SDP Model: imputing a covariance matrix

$
0
0
Missing values are a well-known problem in statistics. The simplest approach is just to delete all data cases that have missing values. Another approach is to repair things by filling in reasonable values, This is called imputation. Imputation strategies can be very sophisticated (and complex).

Statistical tools have often direct support for representing missing values. E.g. the R language has NA (not available). GAMS also has NA. Python has no explicit support for missing values. By convention, the special floating point value NaN (Not a Number) is used to indicate missing values for floating point numbers. It is noted that the numpy library has some facilities to deal with missing data, but it is not really like R's NA [2].

In [1] a semi-definite programming (SDP) model is proposed to deal with a covariance matrix with some missing values by imputation. The constraint to be added is that the covariance matrix should remain positive-semi definite (PSD). A covariance matrix should be in theory PSD, but in practice it can happen it is not. The resulting model is stated as:

Impute missing values in Covariance Matrix (from [1])
\[\begin{align} \text{minimize}\>& 0\\ \text{subject to}\>&\color{darkred}{\Sigma}_{i,j} = \widetilde{\color{darkblue}{\Sigma}}_{i,j} && (i,j)\notin \color{darkblue}M\\ & \color{darkred}{\Sigma} \succeq 0 \end{align} \]

Here \(\widetilde{\Sigma}\) is the covariance matrix with missing data in locations \((i,j)\in M\). The  variable \(\Sigma\) is the new covariance matrix with missing data filled in such that \(\Sigma\) is positive semi-definite. This last condition is denoted by \(\Sigma \succeq 0\). In this model there is no objective, as indicated by minimizing zero.

CVXPY implementation


There is no code provided for this model in [1]. So let me give it a try. CVXPY does not have good support for things like \(\forall (i,j) \notin M\).  I can see two approaches:

  • Expand the constraint into scalar form. Essentially, a DIY approach.
  • Use a binary data matrix \(M_{i,j} \in \{0,1\}\) indicating the missing values and write \[(e\cdot e^T-M) \circ \Sigma = \widetilde{\Sigma}_0\] where \(\circ\) is elementwise multiplication (a.k.a. Hadamard product), \(e\) is a column vector of ones of appropriate size, and \(\widetilde{\Sigma}_0\) is \(\widetilde{\Sigma}\) but with NaN's replaced by zeros.

In addition, let's add a regularizing objective: minimize sum of squares of \(\Sigma_{i,j}\).

The Python code for these two models is:


import numpy as np
import pandas as pd
import cvxpy as cp

#------------ data ----------------

cov = np.array([
[ 0.300457, -0.158889, 0.080241, -0.143750, 0.072844, -0.032968, 0.077836, 0.049272],
[-0.158889, 0.399624, np.nan, 0.109056, 0.082858, -0.045462, -0.124045, -0.132096],
[ 0.080241, np.nan, np.nan, -0.031902, -0.081455, 0.098212, 0.243131, 0.120404],
[-0.143750, 0.109056, -0.031902, 0.386109, -0.058051, 0.060246, 0.082420, 0.125786],
[ 0.072844, 0.082858, -0.081455, -0.058051, np.nan, np.nan, -0.119530, -0.054881],
[-0.032968, -0.045462, 0.098212, 0.060246, np.nan, 0.400641, 0.051103, 0.007308],
[ 0.077836, -0.124045, 0.243131, 0.082420, -0.119530, 0.051103, 0.543407, 0.121709],
[ 0.049272, -0.132096, 0.120404, 0.125786, -0.054881, 0.007308, 0.121709, 0.481395]
])
print("Covariance data with NaNs")
print(pd.DataFrame(cov))

M = 1*np.isnan(cov)
print("M (indicator for missing values)")
print(pd.DataFrame(M))

dim = np.shape(cov)
n = dim[0]

#----------- model 1 -----------------

Sigma = cp.Variable(dim, symmetric=True)

prob = cp.Problem(
cp.Minimize(cp.sum_squares(Sigma)),
[ Sigma[i,j] == cov[i,j] for i in range(n) for j in range(n) if M[i,j]==0 ] +
[ Sigma >> 0 ]
)
prob.solve(solver=cp.SCS,verbose=True)

print("Status:",prob.status)
print("Objective:",prob.value)
print(pd.DataFrame(Sigma.value))

#----------- model 2 -----------------

e = np.ones((n,1))
cov0 = np.nan_to_num(cov,copy=True)

prob2 = cp.Problem(
# cp.Minimize(cp.trace(Sigma.T@Sigma)), <--- not recognized as convex
cp.Minimize(cp.norm(Sigma,"fro")**2),
[ cp.multiply(e@e.T - M,Sigma) == cov0,
Sigma >> 0 ]
)
prob2.solve(solver=cp.SCS,verbose=True)

print("Status:",prob2.status)
print("Objective:",prob2.value)
print(pd.DataFrame(Sigma.value))


Notes:

  • Model 1 has a (long) list of scalar constraints. The objective is \[\min\>\sum_{i,j} \Sigma_{i,j}^2\] Sorry for the possible confusion between the symbols for summation and covariance.
  • CVXPY uses the notation Sigma >> 0 to indicate \(\Sigma \succeq 0\) (i.e. \(\Sigma\) should be positive semi-definite).
  • We added the condition that \(\Sigma\) should be symmetric in the variable statement. This seems to be needed. Without this, the solver may return a non-symmetric matrix. I suspect that in that case, the matrix \(0.5(\Sigma+\Sigma^T)\) rather than \(\Sigma\) itself is required to be positive definite.
  • Model 2 is an attempt to use matrix notation. The objective can be stated as \[\min\>\mathbf{tr}(\Sigma^T\Sigma)\] but that is not recognized as being convex. As alternative I used the Frobenius norm: \[||A||_F =\sqrt{ \sum_{i,j} a_{i,j}^2}\]
  • The function np.nan_to_num converts NaN values to zeros.
  • The function cp.multiply performs elementwise multiplication (as opposed to matrix multiplication).
  • I don't think we can easily only pass only the upper triangular part of the covariance matrix to the solver. For large problems this would save some effort (cpu time and memory).
  • In a traditional optimization model we would have just \(|M|\) decision variables (corresponding to the missing values). Here, in the scalar model, we have \(n^2\) variables and \(n^2-|M|\) constraints.


The results are:


Covariance data with NaNs
0123456 \
00.300457 -0.1588890.080241 -0.1437500.072844 -0.0329680.077836
1 -0.1588890.399624 NaN 0.1090560.082858 -0.045462 -0.124045
20.080241 NaN NaN -0.031902 -0.0814550.0982120.243131
3 -0.1437500.109056 -0.0319020.386109 -0.0580510.0602460.082420
40.0728440.082858 -0.081455 -0.058051 NaN NaN -0.119530
5 -0.032968 -0.0454620.0982120.060246 NaN 0.4006410.051103
60.077836 -0.1240450.2431310.082420 -0.1195300.0511030.543407
70.049272 -0.1320960.1204040.125786 -0.0548810.0073080.121709

7
00.049272
1 -0.132096
20.120404
30.125786
4 -0.054881
50.007308
60.121709
70.481395
M (indicator for missing values)
01234567
000000000
100100000
201100000
300000000
400001100
500001000
600000000
700000000
----------------------------------------------------------------------------
SCS v2.1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 160
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 37, constraints m = 160
Cones: primal zero / dual free vars: 58
soc vars: 66, soc blks: 1
sd vars: 36, sd blks: 1
Setup time: 9.69e-03s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 4.05e+197.57e+191.00e+00 -3.12e+191.92e+201.21e+201.53e-02
40| 2.74e-101.01e-094.51e-111.71e+001.71e+001.96e-171.88e-02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.89e-02s
Lin-sys: nnz in L factor: 357, avg solve time: 1.54e-06s
Cones: avg projection time: 1.52e-04s
Acceleration: avg step time: 1.66e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.7898e-17, dist(y, K*) = 1.5753e-09, s'y/|s||y| = 3.7338e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.7439e-10
dual res: |A'y + c|_2 / (1 + |c|_2) = 1.0103e-09
rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 4.5078e-11
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544257213233
0123456 \
00.300457 -0.1588890.080241 -0.1437500.072844 -0.0329680.077836
1 -0.1588890.399624 -0.0841960.1090560.082858 -0.045462 -0.124045
20.080241 -0.0841960.198446 -0.031902 -0.0814550.0982120.243131
3 -0.1437500.109056 -0.0319020.386109 -0.0580510.0602460.082420
40.0728440.082858 -0.081455 -0.0580510.135981 -0.041927 -0.119530
5 -0.032968 -0.0454620.0982120.060246 -0.0419270.4006410.051103
60.077836 -0.1240450.2431310.082420 -0.1195300.0511030.543407
70.049272 -0.1320960.1204040.125786 -0.0548810.0073080.121709

7
00.049272
1 -0.132096
20.120404
30.125786
4 -0.054881
50.007308
60.121709
70.481395
----------------------------------------------------------------------------
SCS v2.1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 162
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 38, constraints m = 168
Cones: primal zero / dual free vars: 64
soc vars: 68, soc blks: 2
sd vars: 36, sd blks: 1
Setup time: 1.02e-02s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 3.67e+195.42e+191.00e+00 -2.45e+191.28e+201.04e+209.84e-03
40| 5.85e-101.47e-097.31e-101.71e+001.71e+008.09e-171.29e-02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.31e-02s
Lin-sys: nnz in L factor: 368, avg solve time: 2.56e-06s
Cones: avg projection time: 3.03e-05s
Acceleration: avg step time: 2.45e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.4409e-16, dist(y, K*) = 1.5216e-09, s'y/|s||y| = 4.2866e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 5.8496e-10
dual res: |A'y + c|_2 / (1 + |c|_2) = 1.4729e-09
rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 7.3074e-10
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544261472336
0123456 \
00.300457 -0.1588890.080241 -0.1437500.072844 -0.0329680.077836
1 -0.1588890.399624 -0.0841960.1090560.082858 -0.045462 -0.124045
20.080241 -0.0841960.198446 -0.031902 -0.0814550.0982120.243131
3 -0.1437500.109056 -0.0319020.386109 -0.0580510.0602460.082420
40.0728440.082858 -0.081455 -0.0580510.135981 -0.041927 -0.119530
5 -0.032968 -0.0454620.0982120.060246 -0.0419270.4006410.051103
60.077836 -0.1240450.2431310.082420 -0.1195300.0511030.543407
70.049272 -0.1320960.1204040.125786 -0.0548810.0073080.121709

7
00.049272
1 -0.132096
20.120404
30.125786
4 -0.054881
50.007308
60.121709
70.481395

As a sanity check we can confirm that the eigenvalues of the solution matrix are non-negative:


w,v = np.linalg.eig(Sigma.value)
print(w)

[9.46355900e-016.34465779e-012.35993549e-105.30366506e-02
1.69999646e-012.29670882e-014.36623248e-013.75907704e-01]


Practice


I don't think this is a practical way of dealing with missing values. First of all missing values in the original data will propagate in the covariance matrix. A single NA in the data leads to lots of NAs in the covariance matrix.


----     28 PARAMETER cov  effect of a single NA in the data

j1 j2 j3 j4 j5 j6 j7 j8

j1 0.300457 -0.158889 NA -0.1437500.072844 -0.0329680.0778360.049272
j2 -0.1588890.399624 NA 0.1090560.082858 -0.045462 -0.124045 -0.132096
j3 NA NA NA NA NA NA NA NA
j4 -0.1437500.109056 NA 0.386109 -0.0580510.0602460.0824200.125786
j5 0.0728440.082858 NA -0.0580510.354627 -0.129507 -0.119530 -0.054881
j6 -0.032968 -0.045462 NA 0.060246 -0.1295070.4006410.0511030.007308
j7 0.077836 -0.124045 NA 0.082420 -0.1195300.0511030.5434070.121709
j8 0.049272 -0.132096 NA 0.125786 -0.0548810.0073080.1217090.481395


This propagation is the result of applying the standard formula for the covariance: \[cov_{j,k} = \frac{1}{N-1} \sum_i (x_{i,j}-\mu_j)(x_{i,k}-\mu_k) \] This is of course difficult to fix in the covariance matrix. Just too much damage has been done.

A second problem with our SDP model is that we are not staying close to reasonable values for missing correlations. The model only looks at the PSD constraint.

Basically we need to look at the original data.

A simple remedy is just to throw away the record with the NA. If you have lots of data and relatively few NAs in the data, this is a reasonable approach. However there is a trick we can use. Instead of throwing a whole row of observations away in case we have an NA, we inspect pairs of columns \((j,k)\) individually. For the two columns \(j\) and \(k\) throw away the NAs in these columns and then calculate the covariance \(cov_{j,k}\). Repeat for all combinations \((j,k)\) with \(j \lt k\). R has this built-in;


> cov(a)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.3269524261-0.0223359220.0240629150.026460677-0.00037359160.00213833970.0544727640-0.0008417817
[2,] -0.02233592220.3134442590.0361354130.027115454-0.00459559420.02866593340.05586108430.0222590384
[3,] 0.02406291480.0361354130.3084431820.0036633380.0014232064-0.01584312460.0308769925-0.0177244600
[4,] 0.02646067710.0271154540.0036633380.3228014480.00572219340.01750517220.01528044380.0034349411
[5,] -0.0003735916-0.0045955940.0014232060.0057221930.2920368646-0.00692135670.02271539190.0163823701
[6,] 0.00213833970.028665933-0.0158431250.017505172-0.00692135670.30959356030.00093592710.0506571760
[7,] 0.05447276400.0558610840.0308769930.0152804440.02271539190.00093592710.36350803110.0322080200
[8,] -0.00084178170.022259038-0.0177244600.0034349410.01638237010.05065717600.03220802000.2992700098
> a[2,3]=NA
> cov(a)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.3269524261-0.022335922NA0.026460677-0.00037359160.00213833970.0544727640-0.0008417817
[2,] -0.02233592220.313444259NA0.027115454-0.00459559420.02866593340.05586108430.0222590384
[3,] NANANANANANANANA
[4,] 0.02646067710.027115454NA0.3228014480.00572219340.01750517220.01528044380.0034349411
[5,] -0.0003735916-0.004595594NA0.0057221930.2920368646-0.00692135670.02271539190.0163823701
[6,] 0.00213833970.028665933NA0.017505172-0.00692135670.30959356030.00093592710.0506571760
[7,] 0.05447276400.055861084NA0.0152804440.02271539190.00093592710.36350803110.0322080200
[8,] -0.00084178170.022259038NA0.0034349410.01638237010.05065717600.03220802000.2992700098
> cov(a,use="pairwise")
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.3269524261-0.0223359220.0240779690.026460677-0.00037359160.00213833970.0544727640-0.0008417817
[2,] -0.02233592220.3134442590.0368959960.027115454-0.00459559420.02866593340.05586108430.0222590384
[3,] 0.02407796930.0368959960.3115737540.0033773920.0013694087-0.01622316090.0310202082-0.0180617800
[4,] 0.02646067710.0271154540.0033773920.3228014480.00572219340.01750517220.01528044380.0034349411
[5,] -0.0003735916-0.0045955940.0013694090.0057221930.2920368646-0.00692135670.02271539190.0163823701
[6,] 0.00213833970.028665933-0.0162231610.017505172-0.00692135670.30959356030.00093592710.0506571760
[7,] 0.05447276400.0558610840.0310202080.0152804440.02271539190.00093592710.36350803110.0322080200
[8,] -0.00084178170.022259038-0.0180617800.0034349410.01638237010.05065717600.03220802000.2992700098
>


The disadvantage with pairwise covariances is that it is possible (even theoretically) that the final covariance matrix is not positive-semi definite. We can repair this with R's nearPD function. Essentially, this is performing an eigen-decomposition, replacing the negative eigenvalues by positive ones and then reassembling the covariance matrix (this is just matrix multiplications).

Conclusion


The model presented in [1] is interesting: it is not quite obvious how to implement it in CVXPY (and the code below the example in [1] is not directly related). However, it should be mentioned that better methods are available to address the underlying problem: how to handle missing values in a covariance matrix.

References


  1. Semidefinite program, https://www.cvxpy.org/examples/basic/sdp.html
  2. Missing Data Functionality in NumPy, https://docs.scipy.org/doc/numpy-1.10.1/neps/missing-data.html
  3. Covariance matrix not positive definite in portfolio models, https://yetanothermathprogrammingconsultant.blogspot.com/2018/04/covariance-matrix-not-positive-definite.html

Gurobi v9.0.

$
0
0

Now including elevator music!


 Major new features:

  • Non-convex quadratic solver
    • Supports non-convexities both in objective and constraints
    • Not quite sure if MIQCP is supported (I assume it is, but I think this was not mentioned explicitly)
    • Cplex had already support for (some) non-convex quadratic models, so Gurobi is catching up here.
  • Performance improvements
    • MIP: 18% faster overall and 26% on difficult models (more than 100 seconds)
    • MIQP: 24% faster
    (these numbers are from the email I received -- not from the movie). Quite impressive numbers. Performance does not seem to plateau (yet). Of course we see this also for Cplex: it keeps improving. There is some healthy competition here.

References


Opt Art

$
0
0


New book by TSP and Domino art creator Robert Bosch [1].

Content:

  1. Optimization and the Visual Arts?
  2. Truchet Tiles
  3. Linear Optimization and the Lego Problem
  4. The Linear Assignment problem and Cartoon Mosaics
  5. Domino Mosaics
  6. From the TSP to Continuous Line Drawings
  7. TSP Art with Side Constraints
  8. Knight's Tours
  9. Labyrinth Design with Tiling and Pattern Matching
  10. Mosaics with Side Constraints
  11. Game-of-life Mosaics
Yes, it contains color pictures.

An example of a TSP drawing (from [2]):

25,000-city TSPortrait of George Dantzig [2,3]
Original fotograph


I guess more cities are needed to prevent his teeth from disappearing.

References

Elementwise vs matrix multiplication

$
0
0

Introduction


There are two often used methods to perform the multiplication of matrices (and vectors). The first is simply elementwise multiplication: \[c_{i,j} = a_{i,j} \cdot b_{i,j}\] In mathematics, this is sometimes referred to as the Hadamard product. The notation is typically: \[C = A \circ B\] but sometimes we see: \[C = A \odot B\] This product only works when \(A\) and \(B\) have the same shape. I.e. \[\begin{matrix} C&=&A&\circ&B\\ (m \times n) &&(m \times n)&&(m \times n)\end{matrix}\]

The standard matrix multiplication \(C=A\cdot B\) or \[c_{i,j} = \sum_k a_{i,k} b_{k,j}\] has a different rule for conformance:  \[\begin{matrix} C&=&A&\cdot&B\\ (m \times n) &&(m \times k)&&(k \times n)\end{matrix}\] Most of the time, the dot operator is dropped and we write \(C = A B\).

In optimization modeling both forms are used a lot.

R, CVXR


R has two operators for multiplication:
  • * for elementwise multiplication
  • %*% for matrix multiplication

A vector is considered as a column vector (i.e. an \(n\)-vector is like a \(n \times 1\) matrix).  There is one special thing in R, as shown here:


> m <- 2
> n <- 3
> A <- matrix(c(1,2,3,4,5,6),m,n)
> A
[,1] [,2] [,3]
[1,] 135
[2,] 246
>
> x <- c(1,2,3)
>

#
# matrix multiplication
#
> A %*% x
[,1]
[1,] 22
[2,] 28
>

#
# elementwise multiplication
#
> A * A
[,1] [,2] [,3]
[1,] 1925
[2,] 41636

#
# but this also works
#
> A * x
[,1] [,2] [,3]
[1,] 1910
[2,] 4418
>


The last multiplication is surprising as \(A\) is a \(2 \times 3\) matrix and \(x\) is different in shape. Well, R may extend and recycle vectors to make them as large as needed. In this case \(x\) is duplicated and then considered as a  \(2 \times 3\) matrix. More or less like:


> x
[1] 123
> matrix(x,2,3)
[,1] [,2] [,3]
[1,] 132
[2,] 213


The modeling tool CVXR follows R notation and implements both * and %*% (for elementwise and matrix multiplication). However, CVXR is not implementing the extending and recycling of vectors that are too small.

Concept: recycling


Just to emphasize the concept of recycling. If an operation requires two vectors of the same length, R may make the shorter vector longer by recycling (duplicating). Here is an example:


> a <- 1:10
> b <- 1:2
> c <- a + b
> c
[1] 2446688101012


In this example the vector a has elements 1 through 10. The vector b is too short, so it is recycled. When added to a, b is functionally equal to rep(c(1,2),5). When multiples of b do not exactly fit a, we effectively have a fractional duplication number. E.g. when we use b <- 1:3, we get a message:
Warning message:
In a + b : longer object length is not a multiple of shorter object length

This recycling trick is somewhat unique to R (I don't know about other languages doing this).

Example


In [2] the product: \[b_{i,j} = a_{i,j} \cdot x_i \] is implemented in R with the elementwise product:


> m <- 2
> n <- 3
> A <- matrix(c(1,2,3,4,5,6),m,n)
> A
[,1] [,2] [,3]
[1,] 135
[2,] 246
> x <- c(1,2)
> B <- A*x
> B
[,1] [,2] [,3]
[1,] 135
[2,] 4812


This again uses recycling of \(x\). CVXR is not doing this automatically. We can see this here:


> library(CVXR)
> x <- Variable(m)
> B <- Variable(m,n)
> e <- rep(1,n)
> problem <- Problem(Minimize(0),
+ list(x == c(1,2),
+ B == A * x ))
Error in sum_shapes(lapply(object@args, function(arg) { :
Incompatible dimensions



Here \(x\) is now a CVXR variable. As the vector \(x\) is not recycled, we end up with two different shapes and elementwise multiplication is refused. So how do we do something like this in CVXR?

The recycling operation:


> x
[1] 12
> matrix(x,m,n)
[,1] [,2] [,3]
[1,] 111
[2,] 222


can be expressed in matrix notation as: \[x \cdot e^T\] where \(e\) is a (column) vector of ones. This is sometimes called an outerproduct. I.e. we can write our assignment as \[B = A \circ (x \cdot e^T)\] In a CVXR model this can look like:


> library(CVXR)
> x <- Variable(m)
> B <- Variable(m,n)
> e <- rep(1,n)
> problem <- Problem(Minimize(0),
+ list(x == c(1,2),
+ B == A * (x %*% t(e))))
> sol <- solve(problem)
> sol$status
[1] "optimal"
> sol$getValue(x)
[,1]
[1,] 1
[2,] 2
> sol$getValue(B)
[,1] [,2] [,3]
[1,] 135
[2,] 4812


Note: the constraint B == A * (x %*% t(e)) has both a matrix multiplication and a elementwise multiplication. This is rather funky.

Conclusion: if your matrix operations rely on recycling, you will need to rework things a bit to have this work correctly in CVXR. CVXR does not do recycling.

Python, CVXPY


In the previous section we saw that there are subtle differences between R's and CVXR's elementwise multiplication semantics.

Let's now look at Python and CVXPY.

Since Python 3.5 we have two multiplication operators:

  • * for elementwise multiplication 
  • @ for matrix multiplication 

CVXPY has different rules:

  • *, @  and matmul for matrix multiplication
  • multiply for elementwise multiplication

Example



import numpy as np
import cvxpy as cp

#
# In Python/nmpy * indicates elementwise multiplication
#
A = np.array([[1,2],[3,4]])
B = np.array([[1,1],[2,2]])
C = A*B
print(A)
# output:
# [[ 1 2]
# [ 3 4]]
print(C)
# output:
# [[ 1 2]
# [ 6 8]]

#
# In CVXPY * indicates matrix multiplication
#
A = cp.Variable((2,2))
C = cp.Variable((2,2))
prob = cp.Problem(cp.Minimize(0),
[A == [[1,2],[3,4]],
C == A*B])
prob.solve(verbose=True)
print(A.value)
# output:
# [[1. 3.]
# [2. 4.]]
print(C.value)
# [[ 7. 7.]
# [10. 10.]]

Here we see some differences between Python/Numpy and CVXPY. First the interpretation of Python lists (the values for A) is different. And secondly: the semantics of * are different. This may cause some confusion.

References


  1. CVXR, Convex Optimization in R, https://cvxr.rbind.io/
  2. CVXR Elementwise multiplication of matrix with vector, https://stackoverflow.com/questions/59224555/cvxr-elementwise-multiplication-of-matrix-with-vector

Nonlinear variant of a knapsack problem

$
0
0
In [1] a problem is posed:

Original Problem
\[\begin{align}\max & \sum_i \log_{100}(\color{darkblue}w_i) \cdot \color{darkred}x_i \\ & \frac{\sum_i \color{darkblue}w_i \color{darkred}x_i}{\sqrt{\color{darkred}k}} \le 10,000\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]

The data vector \(w_i\) is assumed to integer valued with \(w_i\ge 1\). Hence the logarithm can be evaluated without a problem. Also we can assume that the number of items can be around 1,000.

I don't think I have ever seen \(\log_{100}()\) being used in a model. Most programming (and modeling) languages only support natural logarithms \(\ln()\) and may be \(\log_{10}()\). We can convert things by: \[\log_{100}(x) = \frac{\ln(x)}{\ln(100)}\] This means the objective can be written as \[\max \frac{1}{\ln(100)} \sum_i \ln(w_i) x_i\] Essentially, the \(\log_{100}()\) function just adds a scaling factor. We can simplify the objective to \[\max \sum_i \ln(w_i)x_i\] (The objective value will be different, but the optimal solution will be the same).

The constraint can be rewritten as: \[\sum_i w_i x_i \le 10,000 \sqrt{k}\] If we ignore the all zero solution, we can assume \(k\ge 1\). This bound will make sure the square root function is always differentiable. With this, we have a standard MINLP (Mixed Integer Nonlinear Programming) model.

MINLP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \sum_i \color{darkblue}w_i \color{darkred}x_i \le 10,000\sqrt{\color{darkred}k}\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred} k \ge 1 \end{align}\]

This model has only a single, well behaved, non-linearity. Literally, there is only one nonlinear nonzero element.  In addition, the model is convex. So we don't expect any problems. For \(w_i\),  I generated 1000 random integer values from the interval \([1,10000]\).

MINLP Results
SolverObjTimeNotes
Dicopt1057.13550.63 NLP, 2 MIP subproblems
SBB1056.936195Node limit exceeded
Bonmin1056.9472600Time limit exceeded
Bonmin1057.135516Option: bonmin.algorithm B-OA

The outer-approximation based algorithms (Dicopt, Bonmin with B-OA option) do much better than the branch & bound algorithms (SBB, default Bonmin). Even the global solvers do better:

Global Solver Results
SolverObjTime
Baron1057.13552
Antigone1057.13551
Couenne1057.13559

The problem can also be formulated as a convex MIQCP (Mixed Integer Quadratically Constrained Programming) model:


MIQCP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \color{darkred}y^2 \le \color{darkred}k\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}y = \frac{\sum_i \color{darkblue}w_i \color{darkred}x_i}{10,000} \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred} k \ge 1, \color{darkred}y \ge 0 \end{align}\]


Solvers like Cplex may convert this into a Cone problem (MISOCP).

Finally, we can also linearize this model by observing that \(k\in \{1,\dots,1000\}\). So we don't really have a continuous function \(f(k)=\sqrt{k}\), but rather only need function values at the integer points. We can exploit this by making this explicit. We can write: \[\begin{align} & k = \sum_i i\cdot \delta_i \\ & \sqrt{k} = \sum_i \sqrt{i}\cdot \delta_i \\ & \sum_i \delta_i = 1 \\ & \delta_i \in \{0,1\}\end{align}\] This is essentially a SOS1 (Special Ordered Set of Type 1) structure implementing a table lookup. The MIP model looks like:


MIP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \sum_i \color{darkblue}w_i \color{darkred} x_i \le 10,000 \color{darkred}q \\ & \color{darkred} k = \sum_i i \cdot \color{darkred}\delta_i \\ & \color{darkred} q = \sum_i \sqrt{i} \cdot \color{darkred}\delta_i \\ & \sum_i \color{darkred} \delta_i = 1 \\ & \color{darkred}x_i, \color{darkred}\delta_i \in \{0,1\} \\ & \color{darkred} k, \color{darkred} q \ge 1 \end{align}\]

When we solve this problem we see:

MIQCP and MIP Results
ModelSolverObjTime
MIQCPCplex1057.13550.3
MIPCplex1057.13550.6

Conclusion: the question in the original post was: how to solve this problem? Here we proposed three different models: an MINLP, MIQCP and MIP model. All these models can be solved quickly. It is noted that the quadratic and linear models are not approximations: they give the same solution as the original MINLP model. Pure non-linear branch & bound methods are having a bit of a problem with the MINLP model, but Outer-Approximation works very well.

References


  1. How do we solve a variant of the knapsack problem in which the capacity of the knapsack keeps increasing as we add more items into the knapsack?, https://stackoverflow.com/questions/59242370/how-do-we-solve-a-variant-of-the-knapsack-problem-in-which-the-capacity-of-the-k

CVXPY large memory allocation

$
0
0
In [1] a simple regression problem was stated and solved with CVXPY. The number of observations is very large (\(200,000\)), while the number of coefficients to estimate is moderate (\(100\)). The first formulation is simply:

Regression I
\[\begin{align}\min_{\color{darkred}w}\>& ||\color{darkblue}y-\color{darkblue}X\color{darkred}w||^2_2 \end{align}\]

In Python code, this can look like:


importcvxpyascp
importnumpyasnp

N = 200000
M = 100

X = np.random.normal(0, 1, size=(N, M))
y = np.random.normal(0, 1, size=N)

w = cp.Variable(M)
prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
prob.solve()
print("status:",prob.status)
print("obj:",prob.value)

Unfortunately, this is giving the following runtime error:



[ec2-user@ip-172-30-0-79 etc]$ python3 ls0.py 
Traceback (most recent call last):
File "ls0.py", line 12, in <module>
prob.solve()
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve
return solve_func(self, *args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve
self._construct_chains(solver=solver, gp=gp)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains
raise e
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 501, in _construct_chains
self._intermediate_chain.apply(self)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/chain.py", line 65, in apply
problem, inv = r.apply(problem)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/qp2symbolic_qp.py", line 60, in apply
return super(Qp2SymbolicQp, self).apply(problem)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 58, in apply
problem.objective)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 96, in canonicalize_tree
canon_arg, c = self.canonicalize_tree(arg)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 99, in canonicalize_tree
canon_expr, c = self.canonicalize_expr(expr, canon_args)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 108, in canonicalize_expr
return self.canon_methods[type(expr)](expr, args)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/atom_canonicalizers/quad_over_lin_canon.py", line 29, in quad_over_lin_canon
return SymbolicQuadForm(t, eye(affine_expr.size)/y, expr), [affine_expr == t]
File "/home/ec2-user/.local/lib/python3.7/site-packages/numpy/lib/twodim_base.py", line 201, in eye
m = zeros((N, M), dtype=dtype, order=order)
MemoryError: Unable to allocate array with shape (200000, 200000) and data type float64
[ec2-user@ip-172-30-0-79 etc]$


This is interesting: CVXPY seems to allocate a \(200,000\times 200,000\) matrix here. Actually it is an identity matrix! This can be seen from the name eye. This seems to be related to the reformulation into a QP model.

To get this a bit more under control, lets make the size a bit smaller and use a memory profiler. At least we get some information about the memory usage.


[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 
-----------------------------------------------------------------
OSQP v0.6.0 - Operator Splitting QP Solver
(c) Bartolomeo Stellato, Goran Banjac
University of Oxford - Stanford University 2019
-----------------------------------------------------------------
problem: variables n = 20100, constraints m = 20000
nnz(P) + nnz(A) = 2040000
settings: linear system solver = qdldl,
eps_abs = 1.0e-05, eps_rel = 1.0e-05,
eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
rho = 1.00e-01 (adaptive),
sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
check_termination: on (interval 25),
scaling: on, scaled_termination: off
warm start: on, polish: on, time_limit: off

iter objective pri res dua res rho time
10.0000e+004.38e+002.98e+041.00e-011.03e+00s
501.9804e+042.13e-094.01e-071.00e-011.65e+00s
plsh 1.9804e+044.01e-151.03e-12 -------- 2.45e+00s

status: solved
solution polish: successful
number of iterations: 50
optimal objective: 19804.0226
run time: 2.45e+00s
optimal rho estimate: 1.35e-02

status: optimal
obj: 19804.022648294507
Filename: ls1.py

Line # Mem usage Increment Line Contents
================================================
450.188 MiB 50.188 MiB @profile
5 def f():
6 # N = 200000
750.188 MiB 0.000 MiB N = 20000
850.188 MiB 0.000 MiB M = 100
9
1050.188 MiB 0.000 MiB np.random.seed(123)
1165.477 MiB 15.289 MiB X = np.random.normal(0, 1, size=(N, M))
1265.707 MiB 0.230 MiB y = np.random.normal(0, 1, size=N)
13
1465.707 MiB 0.000 MiB w = cp.Variable(M)
1565.707 MiB 0.000 MiB prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
16412.086 MiB 346.379 MiB prob.solve(verbose=True)
17412.086 MiB 0.000 MiB print("status:",prob.status)
18412.086 MiB 0.000 MiB print("obj:",prob.value)


We see indeed this is solved as QP model (it is solved by the QP solver OSQP), and in the solve statement we have this spike in memory usage.

The allocation is measured in MiBs or Mebibytes. A mebibyte is equal to \(2^{20}=1,048,576\) bytes.

It is probably a bad idea to allocate this identity matrix like this as a fully allocated matrix. The better approach would be not to use this matrix at all. A second best approach would be to make this identity matrix a sparse matrix.

Approach 1: use a conic programming solver


This is an easy fix. Just use a solver like ECOS:



[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 

ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -2.601e-05 +2e+045e-017e-061e+001e+04 --- --- 12 - | - -
1 -9.312e-01 +2.050e-02 +3e+022e-021e-071e+002e+020.98291e-04122 | 00
2 -5.192e+00 +2.947e+00 +1e+027e-035e-088e+007e+010.61867e-02233 | 00
3 +6.518e+02 +8.866e+02 +3e+026e-024e-072e+021e+020.18009e-01222 | 00
4 +9.646e+02 +9.849e+02 +4e+001e-038e-092e+012e+000.98261e-04776 | 00
5 +1.130e+03 +1.144e+03 +7e-012e-031e-091e+014e-010.88362e-02232 | 00
6 +1.493e+03 +1.521e+03 +3e-011e-024e-103e+012e-010.98901e-01111 | 00
7 +2.057e+03 +2.080e+03 +5e-028e-039e-112e+013e-020.92715e-02121 | 00
8 +2.134e+03 +2.152e+03 +4e-024e-034e-112e+012e-020.81362e-01222 | 00
9 +2.244e+03 +2.256e+03 +3e-024e-022e-111e+011e-020.71071e-01111 | 00
10 +1.995e+03 +2.004e+03 +4e-036e-022e-119e+004e-030.00058e-01111 | 00
11 +6.609e+02 +6.615e+02 +8e+008e-032e-126e-014e+000.98908e-01000 | 00
12 +6.293e+02 +6.423e+02 +1e+015e-032e-121e+018e+000.53253e-01111 | 00
13 +5.873e+02 +7.017e+02 +3e+012e-023e-121e+022e+010.46755e-01233 | 00
14 +7.549e+02 +1.687e+03 +4e+007e-034e-129e+023e+000.98901e-01111 | 00
15 +1.292e+03 +1.300e+03 +2e+002e-033e-128e+009e-010.88132e-01322 | 00
16 +2.141e+03 +2.149e+03 +2e-012e-022e-128e+009e-020.98905e-03111 | 00
17 +2.819e+03 +2.827e+03 +2e-024e-024e-128e+001e-020.90523e-02111 | 00
18 +2.757e+03 +2.774e+03 +5e-022e-022e-122e+012e-020.97464e-01122 | 00
19 +2.615e+03 +2.631e+03 +1e-028e-022e-122e+018e-030.13502e-01111 | 00
20 +2.002e+03 +2.009e+03 +2e-036e-023e-137e+003e-030.00299e-01111 | 00
21 +2.884e+03 +2.959e+03 +2e-038e-023e-128e+019e-030.70807e-01000 | 00
22 +2.861e+03 +2.934e+03 +2e-039e-023e-127e+019e-030.00079e-01111 | 00
23 +1.847e+03 +1.857e+03 +3e-025e-022e-131e+012e-020.00261e+00000 | 00
24 +6.545e+02 +6.548e+02 +1e+019e-032e-123e-016e+000.98906e-01000 | 00
25 +7.003e+02 +7.142e+02 +2e+014e-032e-121e+011e+010.60782e-01110 | 00
26 +6.232e+02 +7.771e+02 +5e+012e-023e-122e+023e+010.57064e-01233 | 00
27 +5.916e+02 +1.396e+03 +8e+003e-032e-128e+026e+000.98901e-01111 | 00
28 +1.468e+03 +1.658e+03 +1e+002e-033e-122e+027e-010.98909e-02111 | 00
29 +2.046e+03 +2.089e+03 +4e-011e-024e-134e+012e-010.98903e-02111 | 00
30 +1.720e+03 +1.744e+03 +7e-025e-023e-142e+016e-020.18732e-01111 | 00
31 +6.708e+02 +6.722e+02 +2e+017e-032e-121e+001e+010.98906e-01000 | 00
32 +5.985e+02 +6.158e+02 +5e+013e-039e-132e+013e+010.92603e-01111 | 00
33 +7.045e+02 +9.047e+02 +1e+022e-022e-122e+026e+010.65915e-01123 | 00
34 +4.246e+02 +8.139e+02 +3e+012e-031e-134e+023e+010.98902e-01111 | 00
35 +1.140e+03 +1.436e+03 +7e-013e-037e-133e+023e+000.96001e-02111 | 00
36 +1.597e+03 +1.707e+03 +1e+004e-033e-141e+029e-010.98902e-01222 | 00
37 +2.184e+03 +2.226e+03 +1e-015e-031e-124e+011e-010.98906e-02122 | 00
38 +2.540e+03 +2.556e+03 +2e-014e-032e-122e+011e-010.98902e-01112 | 00
39 +3.055e+03 +3.065e+03 +5e-026e-032e-121e+013e-020.98908e-02122 | 00
40 +3.450e+03 +3.458e+03 +4e-025e-032e-128e+002e-020.98901e-01112 | 00
41 +3.932e+03 +3.939e+03 +1e-027e-032e-126e+008e-030.98901e-01112 | 00
42 +4.347e+03 +4.352e+03 +1e-026e-032e-126e+005e-030.98901e-01112 | 00
43 +4.796e+03 +4.801e+03 +5e-038e-032e-125e+003e-030.98901e-01112 | 00
44 +5.206e+03 +5.211e+03 +3e-037e-032e-124e+002e-030.98901e-01112 | 00
45 +5.633e+03 +5.637e+03 +2e-031e-022e-124e+001e-030.98901e-01112 | 00
46 +6.028e+03 +6.032e+03 +1e-039e-032e-124e+008e-040.98901e-01112 | 00
47 +6.430e+03 +6.433e+03 +9e-041e-022e-123e+005e-040.98901e-01112 | 00
48 +6.811e+03 +6.814e+03 +7e-041e-022e-123e+004e-040.98901e-01112 | 00
49 +7.188e+03 +7.191e+03 +5e-041e-022e-123e+003e-040.98901e-01112 | 00
50 +7.551e+03 +7.553e+03 +3e-041e-022e-122e+002e-040.98901e-01112 | 00
51 +7.908e+03 +7.910e+03 +2e-042e-022e-122e+001e-040.98902e-01112 | 00
52 +8.251e+03 +8.253e+03 +2e-042e-022e-122e+001e-040.98902e-01112 | 00
53 +8.586e+03 +8.588e+03 +1e-042e-022e-122e+008e-050.98902e-01112 | 00
54 +8.911e+03 +8.913e+03 +1e-042e-022e-122e+006e-050.98902e-01112 | 00
55 +9.228e+03 +9.230e+03 +8e-052e-022e-122e+004e-050.98902e-01112 | 00
56 +9.534e+03 +9.535e+03 +6e-052e-022e-121e+004e-050.98902e-01112 | 00
57 +9.779e+03 +9.780e+03 +5e-051e-012e-121e+003e-050.83572e-01111 | 00
58 +9.766e+03 +9.768e+03 +2e-053e-012e-121e+001e-050.00019e-01111 | 00
59 +3.518e+03 +3.517e+03 -6e+007e+009e-13 -5e-01 -3e+000.00001e+00111 | 00
Unreliable search direction detected, recovering best iterate (58) and stopping.

Close to PRIMAL INFEASIBLE (within feastol=5.7e-06).
Runtime: 20.262543 seconds.

status: infeasible_inaccurate
obj: None
Filename: ls1.py

Line # Mem usage Increment Line Contents
================================================
450.312 MiB 50.312 MiB @profile
5 def f():
6 # N = 200000
750.312 MiB 0.000 MiB N = 20000
850.312 MiB 0.000 MiB M = 100
9
1050.312 MiB 0.000 MiB np.random.seed(123)
1165.578 MiB 15.266 MiB X = np.random.normal(0, 1, size=(N, M))
1265.836 MiB 0.258 MiB y = np.random.normal(0, 1, size=N)
13
1465.836 MiB 0.000 MiB w = cp.Variable(M)
1565.836 MiB 0.000 MiB prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
1676.824 MiB 10.988 MiB prob.solve(solver=cp.ECOS,verbose=True)
1776.824 MiB 0.000 MiB print("status:",prob.status)
1876.824 MiB 0.000 MiB print("obj:",prob.value)


The results are mixed. We certainly don't see this crazy memory allocation anymore. It is now a very small 10 MiB. However the solver is not numerically stable enough to solve this problem.

There is another solver that comes with CVXPY that we can try:


[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 
----------------------------------------------------------------------------
SCS v2.
1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University,
2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A =
2000002
eps =
1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback =
0, rho_x = 1.00e-03
Variables n =
101, constraints m = 20002
Cones:soc vars:
20002, soc blks: 1
WARN: aa_init returned NULL, no acceleration applied.
Setup time:
7.44e-01s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 1.22e+187.87e+181.00e+00 -9.64e+184.27e+214.18e+211.92e-02
100| 8.68e+177.02e+175.09e-011.71e+215.25e+213.54e+216.67e-01
200| 8.04e+175.33e+174.23e-012.16e+215.33e+213.16e+211.31e+00
300| 7.37e+174.39e+173.72e-012.39e+215.22e+212.83e+211.96e+00
400| 6.73e+173.74e+173.35e-012.51e+215.04e+212.53e+212.59e+00
500| 6.14e+173.23e+173.05e-012.57e+214.83e+212.26e+213.22e+00
600| 5.60e+172.82e+172.80e-012.59e+214.60e+212.01e+213.85e+00
700| 5.11e+172.48e+172.57e-012.58e+214.37e+211.79e+214.50e+00
800| 4.67e+172.20e+172.36e-012.56e+214.14e+211.58e+215.15e+00
900| 4.27e+171.96e+172.17e-012.52e+213.92e+211.40e+215.79e+00
1000| 3.91e+171.74e+171.99e-012.47e+213.70e+211.23e+216.46e+00
1100| 3.58e+171.56e+171.81e-012.42e+213.49e+211.07e+217.09e+00
1200| 3.28e+171.40e+171.64e-012.36e+213.29e+219.29e+207.72e+00
1300| 3.01e+171.26e+171.48e-012.30e+213.10e+217.98e+208.36e+00
1400| 2.77e+171.14e+171.31e-012.24e+212.92e+216.78e+209.00e+00
1500| 2.55e+171.03e+171.15e-012.18e+212.75e+215.67e+209.65e+00
1600| 2.35e+179.31e+169.88e-022.12e+212.59e+214.65e+201.03e+01
1700| 2.17e+178.44e+168.25e-022.06e+212.43e+213.71e+201.09e+01
1800| 2.00e+177.67e+166.62e-022.01e+212.29e+212.84e+201.16e+01
1900| 1.85e+176.98e+164.97e-021.95e+212.15e+212.03e+201.23e+01
2000| 1.72e+176.37e+163.30e-021.89e+212.02e+211.29e+201.29e+01
2100| 1.59e+175.81e+161.60e-021.84e+211.90e+215.98e+191.36e+01
2200| 8.80e-031.43e-011.60e-051.20e+041.20e+041.94e-161.42e+01
2300| 6.45e-031.37e-011.62e-051.23e+041.23e+046.83e-171.49e+01
2400| 6.25e-031.34e-011.40e-051.26e+041.26e+042.15e-161.55e+01
2500| 6.06e-031.30e-011.21e-051.28e+041.28e+042.25e-161.62e+01
2600| 5.89e-031.27e-011.04e-051.30e+041.30e+047.78e-171.68e+01
2700| 5.71e-031.24e-018.98e-061.33e+041.33e+048.14e-171.75e+01
2800| 5.55e-031.21e-017.70e-061.35e+041.35e+048.46e-171.82e+01
2900| 5.39e-031.18e-016.56e-061.37e+041.37e+048.78e-171.88e+01
3000| 5.24e-031.15e-015.57e-061.39e+041.39e+049.04e-171.95e+01
3100| 5.09e-031.12e-014.68e-061.41e+041.41e+049.40e-172.01e+01
3200| 4.94e-031.09e-013.90e-061.42e+041.42e+042.91e-162.08e+01
3300| 4.80e-031.07e-013.21e-061.44e+041.44e+049.96e-172.14e+01
3400| 4.67e-031.04e-012.60e-061.46e+041.46e+041.03e-162.21e+01
3500| 4.54e-031.01e-012.05e-061.48e+041.48e+043.18e-162.27e+01
3600| 4.41e-039.87e-021.57e-061.49e+041.49e+041.09e-162.33e+01
3700| 4.29e-039.62e-021.14e-061.51e+041.51e+041.12e-162.40e+01
3800| 4.16e-039.37e-027.64e-071.52e+041.52e+041.14e-162.47e+01
3900| 4.05e-039.12e-024.28e-071.54e+041.54e+041.17e-162.53e+01
4000| 3.93e-038.88e-021.30e-071.55e+041.55e+041.20e-162.60e+01
4100| 3.82e-038.65e-021.34e-071.57e+041.57e+041.23e-162.66e+01
4200| 3.72e-038.42e-023.67e-071.58e+041.58e+041.25e-162.73e+01
4300| 3.61e-038.20e-025.72e-071.59e+041.59e+041.28e-162.80e+01
4400| 3.51e-037.98e-027.52e-071.60e+041.60e+043.92e-162.86e+01
4500| 3.41e-037.77e-029.11e-071.62e+041.62e+041.33e-162.93e+01
4600| 3.32e-037.56e-021.05e-061.63e+041.63e+041.36e-163.00e+01
4700| 3.22e-037.35e-021.17e-061.64e+041.64e+044.15e-163.07e+01
4800| 3.13e-037.15e-021.27e-061.65e+041.65e+041.41e-163.13e+01
4900| 3.04e-036.96e-021.36e-061.66e+041.66e+044.30e-163.20e+01
5000| 2.96e-036.77e-021.44e-061.67e+041.67e+044.37e-163.27e+01
----------------------------------------------------------------------------
Status: Solved/Inaccurate
Hit max_iters, solution may be inaccurate
Timing: Solve time:
3.27e+01s
Lin-sys: nnz in L factor:
2025055, avg solve time: 5.82e-03s
Cones: avg projection time:
3.02e-05s
Acceleration: avg step time:
7.51e-07s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) =
0.0000e+00, dist(y, K*) = 3.6380e-12, s'y/|s||y| = -6.2180e-16
primal res: |Ax + s - b|_2 / (
1 + |b|_2) = 2.9568e-03
dual res: |A'y + c|_2 / (
1 + |c|_2) = 6.7689e-02
rel gap: |c'x + b'y| / (
1 + |c'x| + |b'y|) = 1.4408e-06
----------------------------------------------------------------------------
c'x =
16701.6612, -b'y = 16701.6131
============================================================================
status: optimal_inaccurate
obj:
16701.661202598487
Filename: ls1.py

Line # Mem usage Increment Line Contents
================================================
450.438 MiB 50.438 MiB @profile
5 def f():
6 # N = 200000
750.438 MiB 0.000 MiB N = 20000
850.438 MiB 0.000 MiB M = 100
9
1050.438 MiB 0.000 MiB np.random.seed(123)
1165.699 MiB 15.262 MiB X = np.random.normal(0, 1, size=(N, M))
1265.957 MiB 0.258 MiB y = np.random.normal(0, 1, size=N)
13
1465.957 MiB 0.000 MiB w = cp.Variable(M)
1565.957 MiB 0.000 MiB prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
1676.953 MiB 10.996 MiB prob.solve(solver=cp.SCS,verbose=True)
1776.953 MiB 0.000 MiB print("status:",prob.status)
1876.953 MiB 0.000 MiB print("obj:",prob.value)


That solver also has problems. In this case we ran out of iterations.


Approach 2: use a different formulation


We can use a different formulation:


Regression II
\[\begin{align}\min_{\color{darkred}w,\color{darkred}r}\>& ||\color{darkred}r||^2_2 \\ & \color{darkred}r = \color{darkblue}y - \color{darkblue}X\color{darkred}w \end{align}\]

Here we add a bunch of (free) variables \(r\) for the residuals, and also a bunch of linear constraints. The payback is that we have a much simpler quadratic objective. In addition it is 100% positive (semi-) definite. A reasonable implementation can look like:


importcvxpyascp
importnumpyasnp

# N = 200000
N = 20000
M = 100

X = np.random.normal(0, 1, size=(N,M))
y = np.random.normal(0, 1, size=(N,1))

w = cp.Variable((M,1))
r = cp.Variable((N,1))
prob = cp.Problem(cp.Minimize(r.T @ r), [r == y - X @ w])
prob.solve(solver=cp.OSQP,verbose=True)
print("status:",prob.status)
print("obj:",prob.value)


This should work, but it doesn't. We see:


[ec2-user@ip-172-30-0-79 etc]$ python3 ls2a.py 
Traceback (most recent call last):
File "ls2a.py", line 14, in <module>
prob.solve(solver=cp.OSQP,verbose=True)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve
return solve_func(self, *args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve
self._construct_chains(solver=solver, gp=gp)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains
raise e
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 499, in _construct_chains
construct_intermediate_chain(self, candidate_solvers, gp=gp)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/solvers/intermediate_chain.py", line 70, in construct_intermediate_chain
raise DCPError("Problem does not follow DCP rules. Specifically:\n" + append)
cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically:
The objective is not DCP. Its following subexpressions are not:
var1.T * var1

Bummer. This is a perfectly convex problem!

Some alternative formulations to generate a QP do not help:

  • cp.sum_squares(r) yields a large allocation when forming a QP  
  • sum([r[i]**2 for i in range(N)] is very slow and very memory hungry
  • quad_form with an identity matrix is not efficient 


This did not work out so well. I am not really able to solve relatively large, but easy least squares problems using standard formulations.

Conclusion


  • CVXPY thinks \(r^Tr = \sum_i r_i^2\) is not convex.
  • We cannot always generate large problems for the OSQP QP solver due to large dense memory allocations in CVXPY. I probably would consider this a bug: it is never a good idea to physically create and allocate a large dense identity matrix in any somewhat serious code.
  • CVXPY can generate large instances for the conic solvers ECOS and SCS. But they have some troubles solving this problem.
  • A commercial solver like Cplex has no problem with this; 2 iterations, 0.2 seconds. 
  • We can conclude that CVXPY (and its collection of standard solvers) is better for smaller problems. 

References


  1. Least squares problem run out of memory, https://stackoverflow.com/questions/59315300/least-squares-problem-run-out-of-memory-in-cvxpy







Viewing all 804 articles
Browse latest View live