Scheduling teachers

August 15, 2019, 12:49 pm

≪ Previous: Finding the central point in a point cloud

From [1]:

I am trying to find a solution to the following problem:

There are 6 teachers
Each teacher can only work for 8 hours a day
Each teacher must have a 30 minute break ideally near the middle of his shift
The following table shows the number of teacher needed in the room at certain times:
7am - 8am 1 teacher
8am - 9am 2 teacher
10am - 11am 5 teacher
11am - 5pm 6 teacher
5pm - 6pm 2 teacher

What is a good way to solve this (ideally in python and Google OR-Tools) ?

Thank you

Initial analysis

From the demand data we see that we need all 6 teachers from 11am-5pm working without lunch break. This is 6 hours. So there is no way to allow a lunch break near the middle of the shift.

We have more problems. The picture of the demand data indicates we have no demand between 9am and 10am. That does not look right.

I believe that looking critically at your data is essential for successful optimization applications. You can learn a lot from just a bit of staring.

Alternative problem 1

We can ask a few different questions. If we only allow shifts of the form: 4 hours work, 0.5 hour lunch, 4 hours work (i.e. lunch perfectly in the middle of the shift), how many teachers to we need to meet the demand. To model this, we can assume time periods of half an hour. The number of possible shifts is small:

With an enumeration of the shifts, we can model this as a covering problem:

Covering Model
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_{s\|\color{darkblue}{\mathit cover}(s,t)} \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Here $\mathit{cover}(s,t)=\text{True}$ if shift $s$ covers time period $t$. If we would interpret $\mathit cover(s,t)$ as a binary (data) matrix \[\mathit cover_{s,t} = \begin{cases} 1 & \text{if shift $s$ covers time period $t$}\\ 0 & \text{otherwise}\end{cases}\] we can also write:

Covering Model (alternative interpretation)
\[\begin{align} \min\> &\color{darkblue}z= \sum_s \color{darkred} x_s \\ & \sum_s \color{darkblue}{\mathit cover}_{s,t} \cdot \color{darkred} x_s \ge \color{darkblue}{\mathit demand}_{t} && \forall t \\ &\color{darkred} x_i \in \{0,1,2,\dots\}\end{align} \]

Our new assumptions are:

we only allow shifts with a lunch break in the middle
we add to our demand data: 4 teachers needed between 9am and 10am

After solving this easy MIP model we see:

----55 VARIABLE x.L  number of shifts needed

shift1 2.000,    shift4 2.000,    shift5 2.000,    shift6 2.000


----55 VARIABLE z.L                   =        8.000  total number of shifts

I.e. we need 8 teachers to handle this workload.

The picture shows we are overshooting demand in quite a few periods. Note: this picture illustrates the left-hand side (orange bars) and right-hand side (blue line) of the demand equation in the Covering Model.

Alternative problem 2

Lets make things a bit more complicated. We allow now the following rules for a single shift:

The work period before the lunch bread is between 3 and 5 hours (or between 6 and 10 time periods)
There is a lunch break of 0.5 or 1 hour.
After lunch there is another work period of between 3 and 5 hours. The total number of working hours is 8.

When we enumerate the shifts, we see:

We now have 55 different shifts to consider.

The results look like:

----72 VARIABLE x.L  number of shifts needed

shift1  1.000,    shift22 1.000,    shift26 1.000,    shift39 1.000,    shift49 1.000,    shift52 1.000
shift55 1.000


----72 VARIABLE z.L                   =        7.000  total number of shifts

We see the number of teachers needed is 7. We are closer to the demand curve:

We see that if we add more flexibility we can do a bit better. Achieving 6 teachers is almost impossible. We would need to introduce shifts like: work 2 hours, lunch, work 6 hours. The teachers union would object.

It is noted there are quite a few other methods to solve models like this where we don't need to enumerate all possible shifts [2,3]. For larger problems it may not be feasible to employ the shift enumeration scheme we used here.

References

Constraint programming scheduling problem, https://stackoverflow.com/questions/57473085/constraint-programming-scheduling-problem
Employee Scheduling II: Column Generation, http://yetanothermathprogrammingconsultant.blogspot.com/2017/01/employee-scheduling-ii-column-generation.html
Employee Scheduling IV: Direct Optimization, http://yetanothermathprogrammingconsultant.blogspot.com/2017/03/employee-scheduling-iv-direct.html

↧

R/Python + C++

August 20, 2019, 11:08 pm

≫ Next: Python-MIP

≪ Previous: Scheduling teachers

In some recent projects, I was working on using algorithms implemented in C++ from R and Python. Basically the idea is: Python and R are great languages for scripting but they are slow as molasses. So, it may make sense to develop the time consuming algorithms in C++ while driving the algorithm from R or Python.

R and C++: Rcpp.

The standard way to build interfaces between R and C++ code is to use Rcpp [1,2].

It is possible to directly interface R with low-level C code, This will require a lot of code and knowledge of R internals. Rcpp automates a lot of this. E.g. Rcpp will take care of translating an R vector into a C++ vector.

Rcpp supports both small fragments of C++ code passed on as an R string to more coarse-grained file based approach [3]. For Windows, you need to download the GNU compilers [4].

If you are new to both Rcpp and to building your own R packages [5] things may be a bit overwhelming.

Rstudio can help a lot. It supports a lot of very useful tasks:

Syntax coloring for C++ code.
Building projects.
Git version control.
Documentation tools (rmarkdown and bookdown). My documentation is also a good test: it executes almost all of the code when building the document.

Editing C++ code in RStudio

Basically I never have to leave RStudio.

I have added an alternative driver file for my C++ code so I can debug it in Visual Studio. I used it only a few times: most of the time I just used RStudio.

Python and C++: pybind11

pybind11 [6] is in many respects similar to Rcpp, although it requires a little bit more programming to bridge the gap between Python and C++.

In the beginning of the above Youtube video [7], the presenter compares pybind11 with some of the alternatives:

SWIG: author of SWIG says: don't use it
ctypes: call c functions but not c++
CFFI: call c functions
Boost.Python: support for older C++ standards, but not much maintained
pybind11: modern

As with Rcpp, calling the compiler is done through running a build or setup script. For Rcpp I used the GNU compilers, while pybind11/pip install supports the Visual Studio C++ compiler. This also means that if you have little experience with both pybind11 and creating packages, the learning curve may be steep.

References

http://www.rcpp.org
Dirk Eddelbuettel, Seamless R and C++ Integration with Rcpp, Springer, 2013
Chapter: Rewriting R code in C++, https://adv-r.hadley.nz/rcpp.html
Hadley Wickham, R packages, O'Reilly, 2015
https://cran.r-project.org/bin/windows/Rtools/
https://pybind11.readthedocs.io/en/master/
Robert Smallshire, Integrate Python and C++ with pybind11, https://www.youtube.com/watch?v=YReJ3pSnNDo

↧

Python-MIP

August 26, 2019, 7:00 am

≫ Next: Octeract

≪ Previous: R/Python + C++

This is another modeling tool for Python.

There are quite a few modeling tools available for Python: Pyomo, PuLP, and most commercial LP/MIP solvers some with some Python modeling layer.

This is what caught my eye when reading about Python-MIP:

The name is rather unimaginative.
Looks like the authors are from Brazil.
Supported solvers are CBC and Gurobi.
The Python-MIP is compatible with the just-in-time compiler PyPy, which can lead to substantial performance improvements.
It is claimed that with PyPy jit, python-MIP can be 25 times as fast than the Gurobi modeling tool.
There are some interesting facilities supported by Python-MIP:

Cuts can be provided using a call-back mechanism
Support for MIPSTART (initial integer solution)
Solution pool

Question

In [3] an interesting question came up. The value of an integer variable is often slightly non-integer. E.g. something like 0.0000011625. This is the result of the integer feasibility tolerance that a solver applies. In the discussion [3] the remark is made:

I believe there is more to this. Rounding integer solutions can lead to larger infeasibilities. With some aggressive presolve/scaling these infeasibilities can sometimes be large after postsolve/unscaling. Also: some equations may have long summations of binary variables. This would accumulate a lot of rounding errors. And then there are these big-M constraints....

It also means that using the steps:

solve
fix solution (or fix integers) to optimal solution
solve

may lead to "feasible" for the first solve, but "infeasible" for the second solve. E.g. when we want duals for the fixed LP we use this "fix integers" step.

Safer would be to tighten the integer feasibility tolerance. Cplex even allows epint=0 (epint is Cplex's integer feasibility tolerance). Of course tightening the integer feasibility tolerance will likely lead to longer solution times.

Indeed, modeling systems and solvers currently handle this by offloading the problem to the user. The solver is probably the right place to deal with this. Not sure developers are eager to work on this. Taking responsibility by simple-minded rounding may be asking for more problems than it solves.

On the other hand, these slightly fractional values certainly cause confusion. Especially for beginners in MIP modeling.

So the question remains: is rounding integer variables a good idea?

References

↧

Octeract

August 28, 2019, 3:43 am

≫ Next: Demo problem with constraint on standard deviation

≪ Previous: Python-MIP

I don't know what or who this is.

This seems to be a parallel deterministic solver for non-convex MINLPs. Some things I noticed:

!np.easy : cute (but nonsense of course: some problems just remain difficult).
"The first massively parallel Deterministic Global Optimization solver for general non-convex MINLPs."
Symbolic manipulation: like some other global solvers they need the symbolic form of the problem so they can reason about this. I.e. no black-box problems.
Support for AMPL and Pyomo
"Octeract Engine has implementations of algorithms that guarantee convergence to a global optimum in finite time. The correctness of the majority of the calculations are ensured through a combination of interval arithmetic and infinite precision arithmetic."
It looks like the benchmarks [3] are against itself (so always a winner)
I don't see any names on the web site. The About Company section is unusually vague.

Some of the competing solvers are Baron, Couenne, and Antigone.

References

https://octeract.com/
Manual: https://octeract.com/wp-content/uploads/2019/08/user_manual.pdf
Benchmarks: https://octeract.com/benchmarks/

↧

Demo problem with constraint on standard deviation

September 18, 2019, 12:56 pm

≫ Next: Duplicate constraints in Pyomo model

≪ Previous: Octeract

In [1] a hypothetical demo problem is shown. I don't think it is a real problem, but rather contrived as an example. Nevertheless, there are things to say about it.

The problem is:

Original Problem
\[\begin{align}\min\>&\sum_i \color{darkred}x_i\\ & \mathbf{sd}(\color{darkred}x) \lt \color{darkblue}\alpha\\ & \color{darkred}x_i \in \{0,1\}\end{align}\]

Notes:

Here sd is the standard deviation.
We assume $x$ has $n$ components.
Of course, $\lt$ is problematic in optimization. So the equation should become a $\le$ constraint.
The standard formula for the standard deviation is: \[ \sqrt{\frac{\sum_i (x_i-\bar{x})^2}{n-1}}\] where $\bar{x}$ is the average of $x$.
This is an easy problem. Just choose $x_i=0$.
When we use $\max \sum_i x_i$ things are equally simple. In that case choose $x_i = 1$.
There is symmetry: $\mathbf{sd}(x) = \mathbf{sd}(1-x)$.
A more interesting problem is to have $\mathbf{sd}(x)\ge\alpha$.

Updated problem

A slightly different problem, and somewhat reformulated is:

MIQCP problem
\[\begin{align}\min\>&\bar{\color{darkred}x} \\ & \bar{\color{darkred}x}= \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n}\\ & \frac{\sum_i (\color{darkred}x_i - \bar{\color{darkred}x})^2}{\color{darkblue}n-1} \ge \color{darkblue}\alpha^2 \\ & \color{darkred}x_i \in \{0,1\}\end{align}\]

First, we replaced $\lt$ by $\ge$ to make the problem more interesting. Furthermore I got rid of the square root. This removes a possible problem with being non-differentiable at zero. The remaining problem is a non-convex quadratically constrained (MIQCP=Mixed Integer Quadratically Constrained Problem). The non-convexity implies we want a global solver.

This model solves easily with solvers like Baron or Couenne.

Integer variable

When we look at the problem a bit more, we see we are not really interested in which $x_i$'s are zero or one. Rather, we need only to worry about the number. Let $k = \sum_i x_i$, Obviously $\bar{x}=k/n$. But more interestingly: \[\mathbf{sd}(x) = \sqrt{\frac{k (1-\bar{x})^2+(n-k)(0-\bar{x})^2}{n-1}}\] The integer variable $k$ is restricted to $k=0,2,\dots,n$.

Thus we can write:

MINLP problem
\[\begin{align}\min\>&\color{darkred}k \\ & \frac{\color{darkred}k (1-\color{darkred}k/\color{darkblue}n)^2+(\color{darkblue}n-\color{darkred}k) (\color{darkred}k/\color{darkblue}n)^2}{\color{darkblue}n-1} \ge \color{darkblue}\alpha^2 \\ & \color{darkred}k = 0,1,\dots,\color{darkblue}n \end{align}\]

The constraint can be simplified into \[\frac{k-k^2/n}{n-1}\ge \alpha^2\] This is now so simple we can do this by enumerating $k=0,\dots,n$, check the constraint, and pick the best.

Because of the form of the standard deviation curve (note the symmetry), we can specialize the enumeration loop and restrict the loop to $k=1,\dots,\lfloor n/2 \rfloor$. Pick the first $k$ that does not violate the constraint (and when found exit the loop). For very large $n$ we can use something like a bisection to speed things up even further.

So this example optimization problem does not really need to use optimization at all.

References

Constrained optimisation with function in the constraint and binary variable, https://stackoverflow.com/questions/57850149/constrained-optimisation-with-function-in-the-constraint-and-binary-variable
Another problem that minimizes the standard deviation, https://yetanothermathprogrammingconsultant.blogspot.com/2017/09/minimizing-standard-deviation.html

↧

Duplicate constraints in Pyomo model

September 28, 2019, 8:43 pm

≫ Next: Running a MIP solver on Raspberry Pi

≪ Previous: Demo problem with constraint on standard deviation

Introduction

Pyomo [1] is a popular Python based modeling tool. In [2] a question is posed about a situation where a certain constraint takes more than 8 hours to generate. As we shall see, the reason is that extra indices are used.

A simple example

The constraint \[y_i = \sum_j x_{i,j} \>\>\>\forall i,j\] is really malformed. The extra $\forall j$ is problematic. What does this mean? One could say, this is wrong. We can also interpret this differently. Assume the inner $j$ is scoped (i.e. local). Then we could read this as: repeat the constraint $y_i = \sum_j x_{i,j}$, $n$ times. Here $n=|J|$ is the cardinality of set $J$.

The GAMS fragment corresponding to this example, shows GAMS will object to this construct:

11 equation e(i,j);

12 e(i,j).. y(i) =e= sum(j, x(i,j));

**** $125

**** 125 Set is under control already

**** 1 ERROR(S) 0 WARNING(S)

The Pyomo equivalent can look like:

def eqRule(m,i,j):

return m.Y[i] == sum(m.X[i,j] for j in m.J);

model.Eq = Constraint(model.I,model.J,rule=eqRule)

This fragment is a bit more difficult to read, largely due to syntactic clutter. But in any case: Python and Pyomo accepts this constraint as written. To see what is generated, we can use

model.Eq.pprint()

This will show something like:

Eq : Size=6, Index=Eq_index, Active=True

Key : Lower : Body : Upper : Active

('i1', 'j1') : 0.0 : Y[i1] - (X[i1,j1] + X[i1,j2] + X[i1,j3]) : 0.0 : True

('i1', 'j2') : 0.0 : Y[i1] - (X[i1,j1] + X[i1,j2] + X[i1,j3]) : 0.0 : True

('i1', 'j3') : 0.0 : Y[i1] - (X[i1,j1] + X[i1,j2] + X[i1,j3]) : 0.0 : True

('i2', 'j1') : 0.0 : Y[i2] - (X[i2,j1] + X[i2,j2] + X[i2,j3]) : 0.0 : True

('i2', 'j2') : 0.0 : Y[i2] - (X[i2,j1] + X[i2,j2] + X[i2,j3]) : 0.0 : True

('i2', 'j3') : 0.0 : Y[i2] - (X[i2,j1] + X[i2,j2] + X[i2,j3]) : 0.0 : True

We see for each $i$ we have three duplicates. The way to fix this is to remove the function argument $j$ from eqRule:

def eqRule(m,i):

return m.Y[i] == sum(m.X[i,j] for j in m.J);

model.Eq = Constraint(model.I,rule=eqRule)

After this, model.Eq.pprint() produces

Eq : Size=2, Index=I, Active=True

Key : Lower : Body : Upper : Active

i1 : 0.0 : Y[i1] - (X[i1,j3] + X[i1,j2] + X[i1,j1]) : 0.0 : True

i2 : 0.0 : Y[i2] - (X[i2,j3] + X[i2,j2] + X[i2,j1]) : 0.0 : True

This looks much better.

The original problem

The constraint in the original question was:

def period_capacity_dept(m, e, j, t, dp):

return sum(a[e, j, dp, t]*m.y[e,j,t] for (e,j) in model.EJ)<= K[dp,t] + m.R[t,dp]

model.period_capacity_dept = Constraint(E, J, T, DP, rule=period_capacity_dept)

Using the knowledge of the previous paragraph we know this should really be:

def period_capacity_dept(m, t, dp):

return sum(a[e, j, dp, t]*m.y[e,j,t] for (e,j) in model.EJ)<= K[dp,t] + m.R[t,dp]

model.period_capacity_dept = Constraint(T, DP, rule=period_capacity_dept)

Pyomo mixes mathematical notation with programming. I think that is one of the reasons this bug is more difficult to see. In normal programming, adding an argument to a function has an obvious meaning. However in this case, adding e,j means in effect: $\forall e,j$. If $e$ and $j$ belong to large sets, we can easily create a large number of duplicates.

References

http://www.pyomo.org
Pyomo: Simple inequality constraint takes unreasonable time to build, https://stackoverflow.com/questions/58034244/pyomo-simple-inequality-constraint-takes-unreasonable-time-to-build

↧

Running a MIP solver on Raspberry Pi

October 2, 2019, 1:43 am

≫ Next: Scipy linear programming: a large but easy LP

≪ Previous: Duplicate constraints in Pyomo model

Raspberry Pi

The Raspberry Pi [1] is a small single board computer. It comes with an ARM based CPU (64 bit, quad core). You can buy it for $35 (no case included). The 4GB RAM version retailes for $55. Raspberry Pi runs some form of Linux. It is mainly used for educational purposes.

SCIP

SCIP [2] is a solver for MIP (and related) models. It is only easily available for academics, using a somewhat non-standard license. As a result, I don't see it used much outside academic circles. So it can not really be called open source.

SCIP on Raspberry Pi

In [3] SCIP is used on the Rapberry Pi with 4GB of RAM. They call it an example of "Edge Computing": bring the algorithm to where it is needed [4] (opposed to moving the data to say a server).

On average SCIP is 3 to 5 times slower on an (standard or overclocked) Rasberry Pi than on a MacBook Pro laptop.

Of course the small amount of RAM means we can only solve relatively small problems. (These days what we call a small MIP problem is actually not so small).

References

↧

Scipy linear programming: a large but easy LP

October 7, 2019, 8:36 am

≫ Next: The gas station problem: where to pump gas and how much

≪ Previous: Running a MIP solver on Raspberry Pi

Scipy.optimize.linprog [1] recently added a sparse interior point solver [2]. In theory we should be able to solve some larger problems with this solver. However the input format is matrix based. This makes it difficult to express LP models without much tedious programming. Of course if the LP model is very structured things are a bit easier. In [3] the question came up if we can solve some reasonable sized transportation problems with this solver. As transportation problems translate into large but easy LPs (very sparse, network structure) this would be a good example to try out.

An LP model for the transportation problem can look like:

Transportation Model
\[ \begin{align} \min \> & \sum_{i,j} \color{darkblue}c_{i,j} \color{darkred} x_{i,j} \\ & \sum_j \color{darkred} x_{i,j} \le \color{darkblue}s_i &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \ge \color{darkblue}d_j &&\forall j\\ & \color{darkred}x_{i,j}\ge 0\end{align} \]

Here $i$ indicate the supply nodes and $j$ the demand nodes. The problem is feasible if total demand does not exceed total supply (i.e. $\sum_i s_i \ge \sum_j d_j$).

Even if the transportation problem is dense (that is each supply node can serve all demand nodes or in other words each link $ i \rightarrow j$ exists), the LP matrix is sparse. There are 2 nonzeros per column.

LP Matrix

The documentation mentions we can pass on the LP matrix as a sparse matrix. Here are some estimates of the difference in memory usage:

	100x100	500x500	1000x1000
Source Nodes	100	500	1,000
Destination Nodes	100	500	1,000
LP Variables	10,000	250,000	1,000,000
LP Constraints	200	500	2,000
LP Nonzero Elements	20,000	500,000	2,000,000
Dense Memory Usage (MB)	15	1,907	15,258
Sparse Memory Usage (MB)	0.3	7.6	30.5

For the $1000\times 1000$ case we see that a sparse storage scheme will be about 500 times as efficient.

Solving a 1000x1000 transportation problem: Implementation

The package scipy.sparse [4] is used to form a sparse matrix.
Scipy.optimize.linprog does not allow for $\ge$ constraints. So our model becomes: \[\begin{align} \min &\sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} \le s_i &&\forall i \\ & \sum_i -x_{i,j} \le -d_j &&\forall j\\ & x_{i,j}\ge 0\end{align}\]

When I run this, I see:

Primal Feasibility  Dual Feasibility    Duality Gap         Step             Path Parameter      Objective
1.01.01.0-1.04999334.387281
0.010966102655090.010966102655040.010966102655041.00.010966102655233423127.924532
0.0074707190847310.0074707190846950.0074707190846950.33691982129820.0074707190848261045138.710249
0.0073756964397050.0073756964396690.0073756964396690.014051713781910.007375696439798946062.4541516
0.0069005237100370.0069005237100040.0069005237100040.071516119893270.006900523710125631457.8940984
0.0033926882271850.0033926882271690.0033926882271690.55427656540860.003392688227229106030.5627759
0.0027162167262180.0027162167262050.0027162167262050.22108237725460.00271621672625277660.93708537
0.001516054263280.0015160542632720.0015160542632720.47061617027720.00151605426329939012.6976106
0.0012383828831990.0012383828831930.0012383828831930.20073815298470.00123838288321531262.77924434
0.00068887637193640.0006888763719330.00068887637193310.47119554969180.000688876371945216884.5788155
0.00040453116015410.00040453116015210.00040453116015220.45045772435740.00040453116015939812.570668161
0.00032784355638580.00032784355638420.00032784355638420.20620715999360.000327843556397943.50442653
0.00019381748726020.00019381748725930.00019381748725930.43049589506030.00019381748726274718.01892459
0.00012721273362630.00012721273362570.00012721273362570.3717755628580.0001272127336283126.320160308
7.325610966318e-057.325610966282e-057.325610966283e-050.45269863331137.325610966411e-051837.061691682
6.047737643405e-056.047737643373e-056.047737643375e-050.18969427780686.047737643482e-051530.292617672
3.301112106729e-053.301112106712e-053.301112106713e-050.47584409114313.301112106771e-05870.6399411648
2.231615463384e-052.231615463375e-052.231615463374e-050.35626693880942.231615463413e-05613.0954966036
1.300693055479e-051.300693055474e-051.300693055474e-050.44376942847221.300693055496e-05388.3160007487
7.533045251385e-067.533045251368e-067.533045251357e-060.44856350948367.533045251489e-06255.9636413848
3.799832196644e-063.799832196622e-063.799832196633e-060.52646433801523.7998321967e-06165.5742065953
2.01284588862e-062.012845888624e-062.012845888615e-060.50064160283362.01284588865e-06122.2520897954
1.143491145379e-061.143491145387e-061.143491145377e-060.47122067516121.143491145397e-06101.1678772704
5.277850584407e-075.277850584393e-075.277850584402e-070.57111390274875.277850584494e-0786.20125171613
3.125695105059e-073.125695105195e-073.125695105058e-070.43159450263633.125695105113e-0780.96090171621
1.118500099738e-071.118500099884e-071.118500099743e-070.67431187431891.118500099763e-0776.06812425522
4.412565084911e-084.412565086951e-084.412565085053e-080.62972570045794.412565085131e-0874.41374033755
6.833044779903e-096.833044770544e-096.833044776856e-090.86823334535776.833044776965e-0973.50145019804
3.3755500974e-103.375549807043e-103.375549865145e-100.95283867739983.375549866004e-1073.34206256371
1.066148223577e-131.065916625724e-131.066069704785e-130.99987769873551.066069928771e-1373.3337765897
7.763476236577e-183.543282811637e-175.469419174887e-180.99995000350895.330350298476e-1873.3337739491
Optimization terminated successfully.
         Current function value: 73.333774
         Iterations: 30
Filename: transport.py

Line #    Mem usage    Increment   Line Contents
================================================
5970.6 MiB     70.6 MiB   @profile
60defrun():
61# dimensions
6270.6 MiB      0.0 MiB      M =1000# sources
6370.6 MiB      0.0 MiB      N =1000# destinations
6478.3 MiB      7.7 MiB      data = GenerateData(M,N)
65108.9 MiB     30.5 MiB      lpdata = FormLPData(data)
66122.6 MiB     13.7 MiB      res = opt.linprog(c=np.reshape(data['c'],M*N),A_ub=lpdata['A'],b_ub=lpdata['rhs'],options={'sparse':True, 'disp':True})

This proves we can actually solve a $1000 \times 1000$ transportation problem (leading to an LP with a million variables) using standard Python tools.

References

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html
https://docs.scipy.org/doc/scipy/reference/optimize.linprog-interior-point.html
Maximum number of decision variables in scipy linear programming module in Python, https://stackoverflow.com/questions/57579147/maximum-number-of-decision-variables-in-scipy-linear-programming-module-in-pytho
https://docs.scipy.org/doc/scipy/reference/sparse.html

↧

The gas station problem: where to pump gas and how much

October 9, 2019, 10:51 am

≫ Next: Sometimes a commercial solver is really better...

≪ Previous: Scipy linear programming: a large but easy LP

Problem

The problem (from [1]) is to determine where to pump gasoline (and how much) during a trip, where prices between gas stations fluctuate.

We consider some different objectives:

minimize cost
minimize number of stops
minimize number of stops followed by minimize cost

Data

I invented some data:

----34 SET i  locations

start    ,    Station1 ,    Station2 ,    Station3 ,    Station4 ,    Station5 ,    Station6 ,    Station7 
Station8 ,    Station9 ,    Station10,    Station11,    Station12,    Station13,    Station14,    Station15
Station16,    Station17,    Station18,    Station19,    Station20,    finish   


----34 SET g  gas stations

Station1 ,    Station2 ,    Station3 ,    Station4 ,    Station5 ,    Station6 ,    Station7 ,    Station8 
Station9 ,    Station10,    Station11,    Station12,    Station13,    Station14,    Station15,    Station16
Station17,    Station18,    Station19,    Station20


----34 PARAMETER efficiency           =18.000  [miles/gallon]
            PARAMETER capacity             =50.000  tank-capacity [gallons]
            PARAMETER initgas              =25.000  initial amount of gasoline in tank [gallons]
            PARAMETER finalgas             =10.000  minimum final amount of gas in tank [gallons]
            PARAMETER triplen              =2000.000  length of trip [miles]

----34 PARAMETER price  [$/gallon]

Station1  3.002,    Station2  3.630,    Station3  3.616,    Station4  3.126,    Station5  3.167,    Station6  3.603
Station7  2.067,    Station8  3.281,    Station9  3.748,    Station10 3.783,    Station11 3.065,    Station12 2.349
Station13 3.135,    Station14 2.928,    Station15 3.527,    Station16 3.585,    Station17 3.305,    Station18 3.460
Station19 3.320,    Station20 2.375


----34 PARAMETER distance  [miles]

         Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8    Station9

incr       88.074174.2273.250157.6711.847140.24779.275166.355117.030
cumul      88.074262.301265.550423.221425.068565.315644.590810.945927.975

+   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17   Station18

incr      136.78716.51723.499167.570127.41831.52091.88591.36374.604
cumul    1064.7621081.2791104.7781272.3481399.7671431.2861523.1721614.5351689.139

+   Station19   Station20      finish

incr      148.78818.876143.198
cumul    1837.9261856.8022000.000

Prices and distances were produced using a random number generator.

Note that I added the constraint that we need a little bit left over gas in the tank when arriving at the finish. That requirement was not in the original problem [1]. We can drop this constraint by just setting the parameter $\mathit{finalgas}=0$.

We also have some derived data: the amount of gas we use for each leg of the trip. This is just the length of the leg divided by the efficiency of the car:

----34 PARAMETER use  gas usage from previous location [gallons]

Station1  4.893,    Station2  9.679,    Station3  0.181,    Station4  8.760,    Station5  0.103,    Station6  7.791
Station7  4.404,    Station8  9.242,    Station9  6.502,    Station10 7.599,    Station11 0.918,    Station12 1.305
Station13 9.309,    Station14 7.079,    Station15 1.751,    Station16 5.105,    Station17 5.076,    Station18 4.145
Station19 8.266,    Station20 1.049,    finish    7.955

Problem 1: minimize cost

The first problem is to minimize fuel cost. I have modeled this by observing three stages at each way point:

First is the amount of gas in the tank when arriving at point $i$. This amount should be non-negative: we cannot drive when the tank is empty. This variable is denoted by $f_{\mathit{before},i}\ge 0$.
The amount we pump is the second stage. This amount is bounded by $[0,\mathrm{capacity}]$. This variable is denoted by $f_{\mathit{pumped},g}$.
The amount in the tank after pumping. This amount cannot exceed the capacity of the tank. This is $f_{\mathit{after},i} \in [0,\mathrm{capacity}]$.

This problem is a little bit like modeling inventory: keep track of what is going out and what is added. The LP model can look like:

Min Cost Model
\[\begin{align} \min \> & \color{darkred}{\mathit{cost}}\\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \end{align}\]

Min Cost Model

\[\begin{align} \min \> & \color{darkred}{\mathit{cost}}\\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \end{align}\]

Note that the set $g$ is a subset of set $i$: $g$ indicates the locations with gas stations between $\mathit{start}$ and $\mathit{finish}$. Also note that we cannot just substitute out the variable $f_{\mathit{before},i}$: we need to make sure this quantity is non-negative. Similarly, we cannot substitute out the variable $f_{\mathit{after},i}$: this must obey the tank capacity bound.

The results look like:

----66 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.10721.23821.05812.29812.1964.40440.758
pumped                  10.81150.000
after       25.00030.91821.23821.05812.29812.1964.40450.00040.758

+    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      34.25626.65725.74024.43440.69133.61231.86126.75621.680
pumped                                          25.566
after       34.25626.65725.74050.00040.69133.61231.86126.75621.680

+   Station18   Station19   Station20      finish

before      17.5359.2708.22110.000
pumped                               9.735
after       17.5359.27017.955


----66 VARIABLE cost.L                =218.998

We see we pump the most at station 7. Looking at the prices this makes sense: gasoline is cheapest at that gas station.

The number of stops where we pump gas is 4, and the total gas bill is $219.

Problem 2: minimize number of stops

In the previous section we solved the minimize cost problem. This gave us 4 stops to refuel with total fuel cost of $219. Now, let's try to minimize the number of times we visit a gas station. Counting in general needs binary variables, and this is no exception. The model can look like:

Min Number of Stops Model
\[\begin{align} \min \> & \color{darkred}{\mathit{numstops}}\\ & \color{darkred}{\mathit{numstops}} = \sum_g \color{darkred} \delta_g \\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred} f_{\mathit{pumped},g} \le \color{darkred} \delta_g \cdot \color{darkblue}{\mathit{capacity}} && \forall g \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \\ & \color{darkred} \delta_g \in \{0,1\} \end{align}\]

Min Number of Stops Model

\[\begin{align} \min \> & \color{darkred}{\mathit{numstops}}\\ & \color{darkred}{\mathit{numstops}} = \sum_g \color{darkred} \delta_g \\ & \color{darkred}{\mathit{cost}} = \sum_g \color{darkred}f_{\mathit{pumped},g} \cdot \color{darkblue}{\mathit{price}}_g \\ & \color{darkred}f_{\mathit{before},i} = \color{darkred}f_{\mathit{after},i-1} - \color{darkblue}{\mathit{use}}_i && \forall i \ne \mathit{start} \\ & \color{darkred}f_{\mathit{after},g} = \color{darkred}f_{\mathit{before},g} + \color{darkred}f_{\mathit{pumped},g} && \forall g \\ & \color{darkred}f_{\mathit{after},\mathit{start}} = \color{darkblue}{\mathit{initgas}} \\ & \color{darkred}f_{\mathit{before},\mathit{finish}} \ge \color{darkblue}{\mathit{finalgas}} \\ & \color{darkred} f_{\mathit{pumped},g} \le \color{darkred} \delta_g \cdot \color{darkblue}{\mathit{capacity}} && \forall g \\ & \color{darkred}f_{k,i} \in [0,\color{darkblue}{\mathit{capacity}}] \\ & \color{darkred} \delta_g \in \{0,1\} \end{align}\]

Because we have binary variables, this is now a MIP model. The constraint $f_{\mathit{pumped},g} \le \delta_g \cdot \mathit{capacity}$ implements the implication: \[\delta_g=0 \Rightarrow f_{\mathit{pumped},g}=0\]When we solve this we see:

----71 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.10710.42810.2471.4881.38541.95432.712
pumped                                                                   6.40646.358
after       25.00020.10710.42810.2471.4887.79146.35841.95432.712

+    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      26.21118.61117.69416.3887.07941.59536.49031.415
pumped                                                                  43.346
after       26.21118.61117.69416.3887.07943.34641.59536.49031.415

+   Station18   Station19   Station20      finish

before      27.27019.00417.95510.000
after       27.27019.00417.955


----71 VARIABLE delta.L  

Station5  1.000,    Station6  1.000,    Station14 1.000


----71 VARIABLE cost.L                =314.254
            VARIABLE numstops.L            =3.000

So instead of 4 stops, now we only need 3 stops. We ignored the cost in this model. This causes the fuel cost to skyrocket to $314 (from $219 in the min cost model).

I kept the cost constraint in the problem for two reasons. First, it functions as an accounting constraint. Such a constraint is just for informational purposes (it is not meant to change or restrict the solution). A second reason is that we use the cost variable in a second solve in order to minimize cost while keeping the number of stops optimal. This is explained in the next section.

Problem 3: minimize number of stops followed by minimizing cost

After solving the min number of stops problem (previous section), we can fix the number of stops variable $\mathit{numstops}$ to the optimal value and resolve minimizing the cost. This is essentially a lexicographic approach to solving the multi-objective problem min numstops, min cost. If we do this we get as solution:

----76 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.10721.23821.05812.29812.1964.40440.758
pumped                  10.81150.000
after       25.00030.91821.23821.05812.29812.1964.40450.00040.758

+    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      34.25626.65725.74024.43415.1258.04641.59536.49031.415
pumped                                                                  35.301
after       34.25626.65725.74024.43415.12543.34641.59536.49031.415

+   Station18   Station19   Station20      finish

before      27.27019.00417.95510.000
after       27.27019.00417.955


----76 VARIABLE delta.L  

Station1  1.000,    Station7  1.000,    Station14 1.000


----76 VARIABLE cost.L                =239.188
            VARIABLE numstops.L            =3.000

Now we have a solution with 3 stops and a fuel cost of $239. This is my proposal for a solution strategy for the problem stated in [1].

An alternative would be to create a single optimization problem with a weighted sum objective: \[\min \> \mathit{numstops} + w \cdot \mathit{cost}\] with $w$ a small constant to make sure that $\mathit{numstops}$ is the most important variable. As the value of $w$ requires some thought, it may be better to use the lexicographic approach.

Filling up the gas tank

Suppose that when pumping gas we always fill up the tank completely. This alternative is not too difficult to handle. We need to add the implication: \[\delta_g=1 \Rightarrow f_{\mathit{pumped},g}=\mathit{capacity}-f_{\mathit{before},g}\] This can be handled using the inequality: \[f_{\mathit{pumped},g} \ge \delta_g \cdot \mathit{capacity}-f_{\mathit{before},g}\]

If we add this constraint and solve the min cost model we see:

----84 VARIABLE f.L  amounts of fuel

             start    Station1    Station2    Station3    Station4    Station5    Station6    Station7    Station8

before                  20.10740.32140.14031.38131.27823.48719.08240.758
pumped                  29.89330.918
after       25.00050.00040.32140.14031.38131.27823.48750.00040.758

+    Station9   Station10   Station11   Station12   Station13   Station14   Station15   Station16   Station17

before      34.25626.65725.74024.43440.69133.61248.24943.14438.068
pumped                                          25.56616.388
after       34.25626.65725.74050.00040.69150.00048.24943.14438.068

+   Station18   Station19   Station20      finish

before      33.92425.65824.60916.654
after       33.92425.65824.609


----84 VARIABLE delta.L  

Station1  1.000,    Station7  1.000,    Station12 1.000,    Station14 1.000


----84 VARIABLE cost.L                =261.709
            VARIABLE numstops.L            =4.000

In this case we have a little bit more gasoline left in the tank at the finish than strictly needed. Notice how in each case we pump gas, we end up with with a full tank. This fill-up strategy is surprisingly expensive.

Conclusion

Here we see the advantages of using an optimization model compared to a tailored algorithm. We can adapt the optimization model to different situations. From the basic min cost model, we can quickly react to new questions.

References

Gas Station Problem - cheapest and least amount of stations, https://stackoverflow.com/questions/58289424/gas-station-problem-cheapest-and-least-amount-of-stations
Shamir Khuller, Azarakhsh Malekian, Julian Mestre, To Fill or not to Fill: The Gas Station Problem, ACM Transactions on Algorithms, Volume 7, Issue 3, July 2011.

↧

Sometimes a commercial solver is really better...

October 13, 2019, 12:42 pm

≫ Next: Rolling horizon approach for scheduling model

≪ Previous: The gas station problem: where to pump gas and how much

Solving a model with an integer valued objective, with 500 binary variables, gave some interesting results.

Cplex. Optimal solution $z=60$ found in about 8 seconds. Using 4 threads on an old laptop.
CBC. Hit timelimit of 2 hours. Objective = 62 (non-optimal). Also using 4 threads on the same machine.

The strange thing is that with CBC the best possible bound is not changing at all. Not by a millimeter. See the highlighted numbers in the log below.

CBC is a very good solver, but sometimes I see things like this.

Cplex Log

Tried aggregator 2 times.
MIP Presolve eliminated 4 rows and 5 columns.
MIP Presolve modified 3 coefficients.
Aggregator did 500 substitutions.
Reduced MIP has 999 rows, 1499 columns, and 2497 nonzeros.
Reduced MIP has 500 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.02 sec. (7.38 ticks)
Found incumbent of value 500.000000 after 0.03 sec. (8.85 ticks)
Probing time = 0.00 sec. (0.20 ticks)
Tried aggregator 1 time.
Reduced MIP has 999 rows, 1499 columns, and 2497 nonzeros.
Reduced MIP has 500 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.00 sec. (5.26 ticks)
Probing time = 0.00 sec. (0.19 ticks)
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 4 threads.
Parallel mode: deterministic, using up to 3 threads for concurrent optimization.
Tried aggregator 1 time.
LP Presolve eliminated 999 rows and 1499 columns.
All rows and columns eliminated.
Presolve time = 0.02 sec. (0.71 ticks)
Initializing dual steep norms . . .
Root relaxation solution time = 0.02 sec. (1.15 ticks)

        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

*     0+    0                          500.0000        0.0000           100.00%
Found incumbent of value 500.000000 after 0.06 sec. (16.78 ticks)
      0     0       55.2664   490      500.0000       55.2664        0   88.95%
*     0+    0                           75.0000       55.2664            26.31%
Found incumbent of value 75.000000 after 0.08 sec. (21.35 ticks)
      0     0       55.2825   353       75.0000     Cuts: 349      222   26.29%
      0     0       55.4160   311       75.0000     Cuts: 349      455   26.11%
      0     0       55.4160   275       75.0000     Cuts: 349      775   26.11%
*     0+    0                           72.0000       55.4160            23.03%
Found incumbent of value 72.000000 after 0.59 sec. (116.10 ticks)
      0     0       55.4160   182       72.0000     Cuts: 349     1032   23.03%
      0     0       55.4160   141       72.0000     Cuts: 349     1208   23.03%
      0     0       55.4160   121       72.0000     Cuts: 192     1336   23.03%
      0     0       55.4160   105       72.0000     Cuts: 128     1410   23.03%
      0     0       55.4160    98       72.0000      Cuts: 85     1472   23.03%
      0     0       55.4160    61       72.0000      Cuts: 74     1494   23.03%
      0     0       55.4160    67       72.0000     Cuts: 160     1571   23.03%
      0     2       55.4160    56       72.0000       55.4160     1571   23.03%
Elapsed time = 1.33 sec. (273.54 ticks, tree = 0.01 MB, solutions = 3)
   1239   854       55.4160    59       72.0000       55.4160     3845   23.03%
                                                     Cuts: 38                  
*  1471+ 1230                           71.0000       55.4160            21.95%
                                                     Cuts: 16                  
Found incumbent of value 71.000000 after 3.39 sec. (827.52 ticks)
*  1474+ 1230                           70.0000       55.4160            20.83%
Found incumbent of value 70.000000 after 3.39 sec. (828.63 ticks)
*  1481+ 1230                           69.0000       55.4160            19.69%
Found incumbent of value 69.000000 after 3.41 sec. (831.79 ticks)
*  1490+ 1230                           68.0000       55.4160            18.51%
Found incumbent of value 68.000000 after 3.44 sec. (834.87 ticks)
*  1731+ 1033                           67.0000       58.7491            12.31%
Found incumbent of value 67.000000 after 6.63 sec. (1824.17 ticks)
*  1731+  688                           65.0000       59.0933             9.09%
Found incumbent of value 65.000000 after 7.25 sec. (2079.31 ticks)
*  1731+  458                           62.0000       60.0000             3.23%
Found incumbent of value 62.000000 after 7.47 sec. (2162.68 ticks)
*  1731+  305                           61.0000       60.0000             1.64%
Found incumbent of value 61.000000 after 8.20 sec. (2495.45 ticks)
*  1731+    0                           60.0000       60.0000             0.00%
Found incumbent of value 60.000000 after 8.58 sec. (2603.40 ticks)
*  1731     0      integral     0       60.0000       60.0000    10141    0.00%
Found incumbent of value 60.000000 after 8.61 sec. (2604.50 ticks)

Cover cuts applied:  150
Implied bound cuts applied:  14
Flow cuts applied:  37
Mixed integer rounding cuts applied:  249
Gomory fractional cuts applied:  26

Root node processing (before b&c):
  Real time             =    1.31 sec. (273.18 ticks)
Parallel b&c, 4 threads:
  Real time             =    7.30 sec. (2331.56 ticks)
  Sync time (average)   =    0.25 sec.
  Wait time (average)   =    0.01 sec.
                          ------------
Total (root+branch&cut) =    8.61 sec. (2604.75 ticks)
MIP status(101): integer optimal solution

CBC Log

Calling CBC main solution routine...
Integer solution of 74 found by feasibility pump after 0 iterations and 0 nodes (2.78 seconds)
Integer solution of 72 found by RINS after 0 iterations and 0 nodes (2.96 seconds)
128 added rows had average density of 31.601563
At root node, 128 cuts changed objective from 55.26645 to 55.416026 in 10 passes
Cut generator 0 (Probing) - 367 row cuts average 2.1 elements, 0 column cuts (12 active)  in 0.022 seconds - new frequency is 1
Cut generator 1 (Gomory) - 598 row cuts average 26.1 elements, 0 column cuts (0 active)  in 0.090 seconds - new frequency is 1
Cut generator 2 (Knapsack) - 14 row cuts average 11.9 elements, 0 column cuts (0 active)  in 0.037 seconds - new frequency is -100
Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.003 seconds - new frequency is -100
Cut generator 4 (MixedIntegerRounding2) - 368 row cuts average 11.2 elements, 0 column cuts (0 active)  in 0.023 seconds - new frequency is 1
Cut generator 5 (FlowCover) - 394 row cuts average 2.8 elements, 0 column cuts (0 active)  in 0.041 seconds - new frequency is 1
Cut generator 6 (TwoMirCuts) - 598 row cuts average 35.5 elements, 0 column cuts (0 active)  in 0.072 seconds - new frequency is -100
After 0 nodes, 1 on tree, 72 best solution, best possible 55.416026 (3.94 seconds)
Integer solution of 70 found by heuristic after 4138 iterations and 57 nodes (14.24 seconds)
Integer solution of 69 found by heuristic after 12712 iterations and 287 nodes (23.32 seconds)
Integer solution of 66 found by heuristic after 18149 iterations and 487 nodes (25.94 seconds)
After 1005 nodes, 555 on tree, 66 best solution, best possible 55.416026 (31.73 seconds)
Integer solution of 65 found by heuristic after 32591 iterations and 1015 nodes (33.93 seconds)
Integer solution of 64 found by heuristic after 43252 iterations and 1405 nodes (39.79 seconds)
Integer solution of 63 found by heuristic after 52409 iterations and 1805 nodes (44.62 seconds)
After 2017 nodes, 1104 on tree, 63 best solution, best possible 55.416026 (45.77 seconds)
After 3045 nodes, 1653 on tree, 63 best solution, best possible 55.416026 (52.29 seconds)
After 4099 nodes, 2212 on tree, 63 best solution, best possible 55.416026 (55.02 seconds)
After 5163 nodes, 2769 on tree, 63 best solution, best possible 55.416026 (57.08 seconds)

. . .

After 131221 nodes, 37229 on tree, 63 best solution, best possible 55.416026 (610.94 seconds)
After 132282 nodes, 37230 on tree, 63 best solution, best possible 55.416026 (613.18 seconds)
After 133298 nodes, 37231 on tree, 63 best solution, best possible 55.416026 (615.50 seconds)
After 134320 nodes, 37318 on tree, 63 best solution, best possible 55.416026 (621.40 seconds)
Integer solution of 62 found by heuristic after 2839018 iterations and 134502 nodes (621.99 seconds)
After 135334 nodes, 37422 on tree, 62 best solution, best possible 55.416026 (627.07 seconds)
After 136352 nodes, 37407 on tree, 62 best solution, best possible 55.416026 (631.83 seconds)
After 137391 nodes, 37395 on tree, 62 best solution, best possible 55.416026 (637.74 seconds)
After 138400 nodes, 37385 on tree, 62 best solution, best possible 55.416026 (642.25 seconds)

 . . .

After 1566320 nodes, 37408 on tree, 62 best solution, best possible 55.416026 (7177.35 seconds)
After 1567325 nodes, 37412 on tree, 62 best solution, best possible 55.416026 (7182.33 seconds)
After 1568336 nodes, 37421 on tree, 62 best solution, best possible 55.416026 (7187.07 seconds)
After 1569358 nodes, 37402 on tree, 62 best solution, best possible 55.416026 (7192.93 seconds)
After 1570410 nodes, 37400 on tree, 62 best solution, best possible 55.416026 (7199.07 seconds)
Thread 0 used 30010 times,  waiting to start 2696,  0 locks, 0 locked, 0 waiting for locks
Thread 1 used 30010 times,  waiting to start 2480,  0 locks, 0 locked, 0 waiting for locks
Thread 2 used 30010 times,  waiting to start 2083,  0 locks, 0 locked, 0 waiting for locks
Thread 3 used 30010 times,  waiting to start 1770,  0 locks, 0 locked, 0 waiting for locks
Main thread 6737 waiting for threads,  0 locks, 0 locked, 0 waiting for locks
Exiting on maximum time
Partial search - best objective 62 (best possible 55.416026), took 19832056 iterations and 1570713 nodes (7202.49 seconds)
Strong branching done 3540964 times (6897 iterations), fathomed 137838 nodes and fixed 779663 variables
Maximum depth 198, 2057372 variables fixed on reduced cost

Time limit reached. Have feasible solution.
MIP solution:    6.200000e+01   (1570713 nodes, 7202.62 CPU seconds, 7202.62 wall clock seconds)

↧

Rolling horizon approach for scheduling model

October 23, 2019, 9:50 am

≫ Next: CVXPY matrix style modeling limits

≪ Previous: Sometimes a commercial solver is really better...

I need to run a (large) scheduling model with a planning horizon of several months: october 2019 through march 2020. This leads to a very large MIP model. One way to split this up into smaller problems is to solve it one month at the time.

Simple approach

This is easier said than done. We don't have a nice break in the schedule at the end of the month. Assignments are bleeding into the next month:

Schedule (partial view)

One way to deal with this problem is to shift the window a bit more complicated way:

Tailored rolling horizon algorithm

We basically solve the problem as a MIP two months at the time, but shift it by only one month. The binary variables more in the future are relaxed to continuous variables. That part becomes like an LP. I.e. the green parts are easy. Past binary variables are fixed. Of course, the MIP presolver will remove all fixed variables from the model, so the orange parts are also easy.

This approach only really works if we don't have global constraints over all months. The danger is that we push the bad stuff into the future. That can even lead to infeasible sub-problems at the end. Luckily this model has no constraints that span all six months.

If we want we can solve the big model at the end using the solution we built up in parts as a starting point (using the mipstart option). If the algorithm is working as expected, this last big MIP model should not find solutions that are much better. This is indeed the case for my model. (Note that not all sub-problems are solved to optimality -- sometimes a small gap remains).

↧

CVXPY matrix style modeling limits

November 7, 2019, 4:29 am

≫ Next: Interesting constraint

≪ Previous: Rolling horizon approach for scheduling model

CVXPY[1] is a popular modeling tool for convex models. It rigorously checks the model is convex which is very convenient: many convex solvers are thoroughly confused when passing on a non-convex model. This is a bit different from say passing on a non-convex model to a local NLP solver. In that case the solver will accept it and try to find local solutions.

In addition CVXPY provides many high level functions (e.g. for different norms, etc.). and a very compact matrix based modeling paradigm. Although matrix notation can be very powerful, there are limits. When overdoing things, the notation becomes less intuitive.

Example: Transportation model

In this section I'll discuss some modeling issues when implementing a simple transportation model in CVXPY, and compare this to a standard GAMS implementation.

As an example consider the standard transportation model.

Transportation Model
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\ & \sum_i\color{darkred}x_{i,j} \ge \color{darkblue} d_j && \forall j\\ &\sum_j\color{darkred}x_{i,j}\le \color{darkblue} s_i && \forall i \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\]	\[\begin{align}\min\>\>&\mathbf{tr}(\color{darkblue}C^T \color{darkred}X) \\ &\color{darkred}X^T \color{darkblue}e \ge \color{darkblue} d \\ &\color{darkred}X \color{darkblue}e \le \color{darkblue}s\\ & \color{darkred}X\ge 0\end{align}\]

Here $e$ indicates a column vector of ones of appropriate size. The model in matrix notation is identical to the equation based model on the left. The matrix based model is compacter, but arguably a bit more difficult to read for most readers (me included).

One important way to help matrix models to be more readable, is to add a summation function. As a matrix model does not have indices, we need other ways to indicate what to sum over. This is expressed as the (optional) axis argument. This means the model above can be expressed in CVXPY as

importnumpyasnp
importcvxpyascp

#----- data -------
capacity = np.array([350, 600])
demand = np.array([325, 300, 275])
distance = np.array([[2.5, 1.7, 1.8],
            [2.5, 1.8, 1.4]])
freight =90
cost = freight*distance/1000

#------ set up LP data --------
C = cost
d = demand
s = capacity

#---- matrix formulation ----

ei = np.ones(s.shape)
ej = np.ones(d.shape)

X = cp.Variable(C.shape,"X")
prob = cp.Problem(
    cp.Minimize(cp.trace(C.T@X)),
    [X.T@ei>= d, 
     X@ej<= s, 
     X>=0])
prob.solve(verbose=True)
print("status:",prob.status)
print("objective:",prob.value)
print("levels:",X.value)

#---- summations ----

prob2 = cp.Problem(
    cp.Minimize(cp.sum(cp.multiply(C,X))),
    [cp.sum(X,axis=0) >= d,
    cp.sum(X,axis=1) <= s,
    X >=0])
prob2.solve(verbose=True)
print("status:",prob2.status)
print("objective:",prob2.value)
print("levels:",X.value)

The matrix model follows the mathematical model closely. The second model with the summations requires some explanation. In Python * is for elementwise multiplication and @ for matrix multiplication. In CVXPY however, @, * and matmul are for matrix multiplication while multiply is for elementwise multiplication. A sum without axis argument sums over all elements, while axis=0 sums over the first index, and axis=1 sums over the second index.

The data for this model is from [2]. The lack of explicit indexing has another consequence. All data is determined by its position. I.e. we need to remember that position 0 means a canning plant in Seattle or a demand market in New York,

I am not so sure about the readability of these two CVXPY models. I think I still prefer the indexed model.

CVXPY uses OSQP [3] as default LP and QP solver. Here is the log:

-----------------------------------------------------------------
           OSQP v0.6.0-  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 6, constraints m = 11
          nnz(P) + nnz(A) = 18
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter   objective    pri res    dua res    rho        time
1-2.4113e+003.32e+027.31e+001.00e-015.47e-05s
2001.5368e+022.77e-032.44e-071.23e-034.30e-02s

status:               solved
solution polish:      unsuccessful
number of iterations: 200
optimal objective:    153.6751
run time:             4.31e-02s
optimal rho estimate: 2.60e-03

status: optimal
objective: 153.67514323621972
levels: [[ 3.17321312e+013.00004719e+022.71849569e-03]
 [ 2.93267895e+02-2.77023874e-032.74995427e+02]]

The levels indicate OSQP is pretty sloppy here: we see some negative values of the order -1e-3. The real optimal objective is 153.675000 (so even though the solution is slightly infeasible, this did not help in achieving a better objective). Sometimes OSQP does better: if the solution polishing step works. This polishing is a bit like a poor man's crossover: it tries to guess the active constraints. In this case polishing did not work.

The number of iterations is large. This is normal: this solver uses a first order algorithm. They typically require a lot of iterations.

For completeness, the original GAMS model looks like:

Set
   i 'canning plants' / seattle,  san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle    350
          san-diego  600 /

   b(j) 'demand at market j in cases'
        / new-york   325
          chicago    300
          topeka     275 /;

Table d(i,j) 'distance in thousands of miles'
              new-york  chicago  topeka
   seattle         2.51.71.8
   san-diego       2.51.81.4;

Scalar f 'freight in dollars per case per thousand miles' / 90 /;

Parameter c(i,j) 'transport cost in thousands of dollars per case';
c(i,j) = f*d(i,j)/1000;

Variable
   x(i,j) 'shipment quantities in cases'
   z      'total transportation costs in thousands of dollars';

Positive Variable x;

Equation
   cost      'define objective function'
   supply(i) 'observe supply limit at plant i'
   demand(j) 'satisfy demand at market j';

cost..      z =e= sum((i,j), c(i,j)*x(i,j));

supply(i).. sum(j, x(i,j)) =l= a(i);

demand(j).. sum(i, x(i,j)) =g= b(j);

Model transport / all /;

solve transport using lp minimizing z;

display x.l, x.m;

The main differences with the CVXPY model are:

GAMS indexes by names (set elements), CVXPY uses positions in a matrix or vector
GAMS is equation based while CVXPY uses matrices
For this model, the CVXPY representation is be very compact, terse but depending on the formulation it requires familiarity with matrix notation.
Arguably, the most important feature of a modeling tool is readability. I have a preference to the GAMS notation here: it is closer to the original mathematical model in a notation I am used to.
When printing the results, GAMS is a bit more intuitive:

GAMS:

----     66 VARIABLE x.L  shipment quantities in cases

             new-york     chicago      topeka

seattle        50.000300.000
san-diego     275.000275.000


Python:
[[ 3.17321312e+013.00004719e+022.71849569e-03]
 [ 2.93267895e+02 -2.77023874e-032.74995427e+02]]

Example: non-convex binary quadratic optimization

Consider the binary quadratic programming problem (in indexed format and in matrix notation):

Binary Quadratic Model
\[\begin{align}\min&\sum_{i,j} \color{darkred}x_i \color{darkblue}q_{i,j}\color{darkred}x_j\\ & \color{darkred}x_i \in \{0,1\} \end{align}\]	\[\begin{align}\min\>\>& \color{darkred}x^T\color{darkblue}Q\color{darkred}x \\ &\color{darkred}x \in \{0,1\} \end{align}\]

We don't assume the Q matrix is positive (semi) definite or even symmetric. This makes the problem non-convex. Interestingly, when we feed this problem into solvers like Cplex or Gurobi, they have no problem in finding the optimal solution. The reason is that they apply a trick to make this problem linear.

We can linearize the binary product $y_{i,j} = x_i x_j$ by \[\begin{align} & y_{i,j} \le x_i \\& y_{i,j} \le x_j \\& y_{i,j} \ge x_i + x_j -1 \\ & x_i, y_{i,j} \in \{0,1\}\end{align}\] If we want we can relax $y$ to be continuous between 0 and 1.

After applying this linearization, we have:

Linearized Binary Quadratic Model
\[\begin{align}\min&\sum_{i,j} \color{darkblue}q_{i,j}\color{darkred}y_{i,j}\\ & \color{darkred}y_{i,j} \le \color{darkred}x_i\\ & \color{darkred}y_{i,j} \le \color{darkred}x_j\\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i+ \color{darkred}x_j - 1\\ & \color{darkred}x_i,\color{darkred}y_{i,j} \in \{0,1\} \end{align}\]	\[\begin{align}\min\>\>& \mathbf{tr}(\color{darkblue}Q^T\color{darkred}Y) \\ & \color{darkred}Y \le \color{darkred}x \cdot\color{darkblue}e^T \\ & \color{darkred}Y \le \color{darkblue}e \cdot\color{darkred}x^T \\& \color{darkred}Y \ge \color{darkred}x \cdot\color{darkblue}e^T + \color{darkblue}e \cdot\color{darkred}x^T - \color{darkblue}e \cdot \color{darkblue}e^T\\ &\color{darkred}x, \color{darkred}Y \in \{0,1\} \end{align}\]

Linearized Binary Quadratic Model

\[\begin{align}\min&\sum_{i,j} \color{darkblue}q_{i,j}\color{darkred}y_{i,j}\\ & \color{darkred}y_{i,j} \le \color{darkred}x_i\\ & \color{darkred}y_{i,j} \le \color{darkred}x_j\\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i+ \color{darkred}x_j - 1\\ & \color{darkred}x_i,\color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

\[\begin{align}\min\>\>& \mathbf{tr}(\color{darkblue}Q^T\color{darkred}Y) \\ & \color{darkred}Y \le \color{darkred}x \cdot\color{darkblue}e^T \\ & \color{darkred}Y \le \color{darkblue}e \cdot\color{darkred}x^T \\& \color{darkred}Y \ge \color{darkred}x \cdot\color{darkblue}e^T + \color{darkblue}e \cdot\color{darkred}x^T - \color{darkblue}e \cdot \color{darkblue}e^T\\ &\color{darkred}x, \color{darkred}Y \in \{0,1\} \end{align}\]

The matrix form of the objective is similar to the one we saw in the section on the transportation problem. The constraints are a little bit more complicated due to the outer products.

Although a solver like Cplex and Gurobi can solve the quadratic formulation directly, CVXPY will complain with the message:

cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically:
The objective is not DCP. Its following subexpressions are not:x * [[-6.56505736e+006.86533416e+001.00750712e+00 -3.97724192e+00
  -4.15575766e+00 -5.51894266e+00 -3.00338992e+007.12540694e+00
  -8.65772554e+004.21338000e-03]
 [ 9.96235254e+001.57466756e+009.82266078e+005.24500934e+00
  -7.38615034e+002.79437518e+00 -6.80964272e+00 -4.99838934e+00
3.37857218e+00 -1.29287238e+00]
 [-2.80599468e+00 -2.97117264e+00 -7.37016820e+00 -6.99796424e+00
1.78227300e+006.61785624e+00 -5.38368524e+003.31468920e+00
5.51715212e+00 -3.92683046e+00]
 [-7.79015418e+004.76973200e-02 -6.79654476e+007.44924622e+00
  -4.69770910e+00 -4.28371356e+001.87911844e+004.45438142e+00
2.56497354e+00 -7.24042700e-01]
 [-1.73386012e+00 -7.64609286e+00 -3.71575466e+00 -9.06896972e+00
  -3.22899456e+00 -6.35800814e+002.91454254e+001.21491094e+00
5.39923440e+00 -4.04388272e+00]
 [ 3.22212522e+005.11643348e+002.54894998e+00 -4.32271604e+00
  -8.27150752e+00 -7.94970662e+002.82502302e+009.06189960e-01
  -9.36950296e+005.84721284e+00]
 [-8.54466004e+00 -6.48677902e+005.12652260e-015.00415338e+00
  -6.43752572e+00 -9.31718028e+001.70262346e+002.42459968e+00
  -2.21276200e+00 -2.82571694e+00]
 [-5.13930766e+00 -5.07156922e+00 -7.38994394e+008.66899440e+00
  -2.40124188e+005.66800922e+00 -3.99931484e+00 -7.49033556e+00
4.97748210e+00 -8.61535074e+00]
 [-5.95968886e+00 -9.89868284e+00 -4.60773896e+00 -2.97050000e-03
  -6.97428262e+00 -6.51661090e+00 -3.38724532e+00 -3.66187892e+00
  -3.55826090e+009.27953282e+00]
 [ 9.87204410e+00 -2.60193890e+00 -2.54222866e+005.43956660e+00
  -2.06631716e+008.26192650e+00 -7.60844540e+004.70957778e+00
  -8.89163050e+001.52599610e+00]] * x

A linearized formulation can look like:

import numpy as np
import cvxpy as cp


# --------  data ---------


Q = np.array([
 [-6.56505736,  6.86533416,  1.00750712, -3.97724192, -4.15575766, -5.51894266, -3.00338992,  7.12540694, -8.65772554,  0.00421338],
 [ 9.96235254,  1.57466756,  9.82266078,  5.24500934, -7.38615034,  2.79437518, -6.80964272, -4.99838934,  3.37857218, -1.29287238],
 [-2.80599468, -2.97117264, -7.3701682 , -6.99796424,  1.782273  ,  6.61785624, -5.38368524,  3.3146892 ,  5.51715212, -3.92683046],
 [-7.79015418,  0.04769732, -6.79654476,  7.44924622, -4.6977091 , -4.28371356,  1.87911844,  4.45438142,  2.56497354, -0.7240427 ],
 [-1.73386012, -7.64609286, -3.71575466, -9.06896972, -3.22899456, -6.35800814,  2.91454254,  1.21491094,  5.3992344 , -4.04388272],
 [ 3.22212522,  5.11643348,  2.54894998, -4.32271604, -8.27150752, -7.94970662,  2.82502302,  0.90618996, -9.36950296,  5.84721284],
 [-8.54466004, -6.48677902,  0.51265226,  5.00415338, -6.43752572, -9.31718028,  1.70262346,  2.42459968, -2.212762  , -2.82571694],
 [-5.13930766, -5.07156922, -7.38994394,  8.6689944 , -2.40124188,  5.66800922, -3.99931484, -7.49033556,  4.9774821 , -8.61535074],
 [-5.95968886, -9.89868284, -4.60773896, -0.0029705 , -6.97428262, -6.5166109 , -3.38724532, -3.66187892, -3.5582609 ,  9.27953282],
 [ 9.8720441 , -2.6019389 , -2.54222866,  5.4395666 , -2.06631716,  8.2619265 , -7.6084454 ,  4.70957778, -8.8916305 ,  1.5259961 ]])


n = Q.shape[0]


# ---- linearized model, matrix format -----

x = cp.Variable((n,1),"x",boolean=True)
Y = cp.Variable((n,n),"Y")
e = np.ones((n,1))


prob = cp.Problem(cp.Minimize(cp.trace(Q.T@Y)),
  [Y <= x@e.T,
   Y <= e@x.T,
   Y >= x@e.T + e@x.T - e@e.T,
   Y >= 0,
   Y <= 1])
prob.solve(solver=cp.GLPK_MI,verbose=True)
print("status:",prob.status)
print("objective:",prob.value)
print("levels:",x.value)

Notes:

The objective can be replaced by cp.Minimize(cp.multiply(Q,Y))
I relaxed the Y variables
We use glpk as the MIP solver
CVXPY comes with an integer solver called ECOS_BB. This solver seems to choke on this problem.

The original GAMS model for this problem was as follows:

set i /i1*i10/;
alias(i,j);

parameter q(i,j);
q(i,j) = uniform(-10,10);

binary variable x(i);
variable z;

equation obj;

obj.. z =e= sum((i,j), x(i)*q(i,j)*x(j));

model m /obj/;
option miqcp=cplex,optcr=0;
solve m minimizing z using miqcp;
display z.l,x.l;

Cplex will automatically linearize this model.

CVXPY Sparse Variables

CVXPY has some severe limitations on how variables can look like.

First we cannot use three (or more) dimensional variables. So something like x[i,j,k] is not supported. A declaration like:

X = cp.Variable((n,n,n),"X")

gives:

ValueError: Expressions of dimension greater than 2 are not supported.

This is a rather severe restriction. Many practical model have variables exceeding 2 dimensions. Of course matrix notation becomes impractical for symbols with more than 2 dimensions. Which is probably the reason why CVXPY ony wants to handle scalars, vectors and matrices.

Furthermore sparse variables are not supported either. Everything is fully allocated. As an example consider the toy model:

n =100
X = cp.Variable((n,n),"X")
prob = cp.Problem(
    cp.Minimize(0),
    [X[0,0]==1]) 


Solver Log:
problem:  variables n =10000, constraints m =1

My hope was that I could just declare a large variable matrix and that CVXPY would only export the used variables to the solver. Instead of the solver seeing a model with one variable, it receives a model with 10,000 variables.

Example: a Sparse Network Model

A linear programming formulation for a max-flow network problem can look like:

Max-flow Sparse Network Model
\[\begin{align}\max\>\>&\color{darkred}f\\ & \sum_{j\|\color{darkblue}{\mathit{arc}}(j,i)}\color{darkred}x_{j,i} = \sum_{j\|\color{darkblue}{\mathit{arc}}(i,j)}\color{darkred}x_{i,j} + \color{darkred}f\cdot \color{darkblue}b_i && \forall i \\ & 0 \le \color{darkred}x_{i,j} \le \color{darkblue}{\mathit{capacity}_i} && \forall i,j\|\color{darkblue}{\mathit{arc}}(i,j) \end{align}\]

Here $arc(i,j)$ indicates whether link $i \rightarrow j$ exists. The sparse data vector $b_i$ is defined by \[b_i = \begin{cases} -1 & \text{if node $i$ is the source node}\\ +1 & \text{if node $i$ is the sink node}\\ 0 & \text{otherwise} \end{cases}\]

This mathematical model translates directly into a GAMS model:

$ontext

   max flow network example

   Data from example in
     Mitsuo Gen, Runwei Cheng, Lin Lin
     Network Models and Optimization: Multiobjective Genetic Algorithm Approach
     Springer, 2008

   Erwin Kalvelagen, Amsterdam Optimization, May 2008

$offtext


sets
   i 'nodes' /node1*node11/
   source(i)   /node1/
   sink(i)    /node11/
;

alias(i,j);

parameter capacity(i,j) /
   node1.node2   60
   node1.node3   60
   node1.node4   60
   node2.node3   30
   node2.node5   40
   node2.node6   30
   node3.node4   30
   node3.node6   50
   node3.node7   30
   node4.node7   40
   node5.node8   60
   node6.node5   20
   node6.node8   30
   node6.node9   40
   node6.node10  30
   node7.node6   20
   node7.node10  40
   node8.node9   30
   node8.node11  60
   node9.node10  30
   node9.node11  50
   node10.node11 50
/;



set arcs(i,j);
arcs(i,j)$capacity(i,j) = yes;
display arcs;

parameter rhs(i);
rhs(source) = -1;
rhs(sink) = 1;

variables
   x(i,j) 'flow along arcs'
   f      'total flow'
;

positive variables x;
x.up(i,j) = capacity(i,j);

equations
   flowbal(i)  'flow balance'
;

flowbal(i)..   sum(arcs(j,i), x(j,i)) - sum(arcs(i,j), x(i,j)) =e= f*rhs(i);

model m /flowbal/;

solve m maximizing f using lp;

The GAMS model exploits that GAMS stores data sparsely. The variables x(i,j) are only allocated when they are used inside the equations. This usage is restricted to cases where arcs(i,j) exist. I.e. the number of variables x(i,j) is 22 instead of $11 \times 11$.

As we discussed in the previous section, CVXPY does not support sparse variables like GAMS. So instead of variables x(i,j) we'll use x[k] where k indicates the arc number. CVXPY supports sparse data matrices through scipy.sparse. In the code below we set up a sparse matrix A with entries as follows:

A[i,k] = -1 if arc $k$ represents an outgoing link $i \rightarrow j$
A[i,k] = +1 if arc $k$ represents an incoming link $j \rightarrow i$

With this we can formulate the model:

import numpy as np
import scipy.sparse as sparse
import cvxpy as cp


# ------ data --------
data = {
'nodes':['A','B','C','D','E','F','G','H','I','J','K'],
'from':['A','A','A','B','B','B','C','C','C','D','E',
'F','F','F','F','G','G','H','H','I','I','J'],
'to':  ['B','C','D','C','E','F','D','F','G','G','H',  
'E','H','I','J','F','J','I','K','J','K','K'],
'capacity': [60,60,60,30,40,30,30,50,30,40,60,20,30,40,30,20,40,30,60,30,50,50], 
'source' : 'A',
'sink' : 'K'
}

numnodes = len(data['nodes'])
numarcs = len(data['capacity'])

print("Number of nodes: {}".format(numnodes))
print("Number of arcs: {}".format(numarcs))

# ------ lp data --------

# map node name to index
map = dict(zip(data['nodes'],range(numnodes)))

# coefficients
irow = np.zeros(2*numarcs,int)
jcol = np.zeros(2*numarcs,int)
val = np.zeros(2*numarcs)

# arc k: i->j has coefficient -1 in row i, column k
#                             +1        j
for k in range(numarcs):
   i = map[data['from'][k]]
   j = map[data['to'][k]]
   kk = 2*k
   irow[kk] = i
   jcol[kk] = k
   val[kk] = -1
   kk = 2*k+1
   irow[kk] = j
   jcol[kk] = k
   val[kk] = 1

A = sparse.csc_matrix((val,(irow,jcol)))

b = np.zeros(numnodes)
b[map[data['source']]] = -1
b[map[data['sink']]] = 1

cap = data['capacity']

# ------ lp model --------

x = cp.Variable(numarcs,"x")
f = cp.Variable(1,"f")

prob = cp.Problem(cp.Maximize(f),
     [A@x == f*b, x >= 0, x <= cap])
prob.solve(verbose=True)
print(prob)

With this experiment, we have confirmed that sparse data matrices work just fine with CVXPY. However, this makes the model not that straightforward anymore. This is more complicated than the corresponding GAMS model.

See [4] for an alternative approach.

Example: Matrix Balancing

In this example we want to estimate the inner part of a matrix subject to row- and column-total constraints. This problem is frequently encountered in economic modeling. An additional constraint is that we want to maintain the sparsity pattern of the matrix. Basically the model is:

Matrix Balancing Model
\[\begin{align}\min\>\>&\mathbf{dist}(\color{darkred}A,\color{darkblue}A^0)\\ & \sum_i\color{darkred}a_{i,j} = \color{darkblue} v_j && \forall j\\ &\sum_j\color{darkred}a_{i,j} = \color{darkblue} u_i && \forall i \\ & \color{darkblue}a^0_{i,j}=0 \Rightarrow\color{darkred}a_{i,j} = 0 \end{align}\]

There are different possibilities for the distance function. E.g. it can be a quadratic function \[\mathbf{dist}(A,A^0) = \sum_{i,j} (a_{i,j}-a^0_{i,j})^2\] or in this case an entropy function \[\mathbf{dist}(A,A^0) =\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right)\]
Often the implication is enforced by just ignoring or skipping all elements $a_{i,j}$ where $a^0_{i,j}=0$. This leads again to a sparse representation of the variables. In GAMS this is quite easy.

$ontext

   Example from

   Using PROC IML to do Matrix Balancing
   Carol Alderman, University of Kansas
   Institute for Public Policy and Business Research
   MidWest SAS Users Group MWSUG 1992

$offtext


sets
   p 'products' /pA*pI/
   s 'salesmen' /s1*s10/
;

table A0(*,*) 'estimated matrix, known totals'

            s1   s2   s3   s4   s5   s6   s7   s8   s9  s10   rowTotal
      pA   230375375100685215502029
      pB   330405419175905045152401052798
      pC   26822524230790301441001998
      pD   59538063827530685605881001603566
      pE   34036044020030755475441502794
      pF   1321902004321301071
      pG   30933035012561247450502305
      pH   36540033015050575600441501102747
      pI   210250308125720256100502015

colTotal  277229103300115024057603526220950495
;

alias (p,i);
alias (s,j);

variables
  A(i,j) 'new values'
  z      'objective (minimized)'
;

equations
   objective
   rowsum(i)
   colsum(j)
;

objective.. z =e= sum((i,j)$A0(i,j), A(i,j)*log(A(i,j)/A0(i,j)));
rowsum(i).. sum(j$A0(i,j), A(i,j)) =e= A0(i,'rowTotal');
colsum(j).. sum(i$A0(i,j), A(i,j)) =e= A0('colTotal',j);

A.L(i,j) = A0(i,j);
A.lo(i,j)$A0(i,j) = 0.0001;

model m /all/;
solve m minimizing z using nlp;

display A.L,z.l;

When solved with the general purpose NLP solver CONOPT, we get the following results:

----     58 VARIABLE A.L  new values

            s1          s2          s3          s4          s5          s6          s7          s8          s9

pA     229.652374.869375.099100.028686.238212.54250.572
pB     330.587406.192420.492175.62694.300506.575510.790243.545
pC     267.313224.685241.80931.297790.595297.24644.017101.037
pD     595.262380.610639.416275.61431.391687.580599.25288.299101.341
pE     339.659360.058440.341200.15831.346756.750469.80944.086151.793
pF     130.330187.815197.821427.953127.080
pG     309.478330.895351.165125.418614.984470.01650.727
pH     360.594395.631326.596148.45551.665569.947586.86743.598150.111
pI     209.123249.246307.260124.701719.378252.398100.874

 +         s10

pB     109.894
pD     167.233
pG      52.318
pH     113.535
pI      52.019


----     58 VARIABLE z.L                   =      -15.769  objective (minimized)

CVXPY does not allow to skip elements like GAMS does. Well, unless we build the whole model in scalar mode: element by element. That is not very attractive so let's try another way. It will be a bit of a struggle. Let's define a (data) matrix D as \[d_{i,j} = \begin{cases} 1 & \text{if $a^0_{i,j} \ne 0$}\\ 0 & \text{if $a^0_{i,j} = 0$}\end{cases}\] This matrix can be used to skip linear terms that are not needed. The objective is another story. CVXPY has the function entr() which is defined by: $-x\log(x)$. We expand the objective as: \[\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right) = \sum_{i,j} - \mathbf{entr}(a_{i,j}) - a_{i,j}\log(a^0_{i,j})\] Finally we insert $d$ to ignore the zero's: \[ \min \sum_{i,j} - d_{i,j} \mathbf{entr}(a_{i,j}) - d_{i,j} a_{i,j}\log(a^0_{i,j}+1-d_{i,j})\] This is quite some gymnastics to shoehorn our model into an acceptable CVXPY format.

import numpy as np
import cvxpy as cp

# --------  data ----------
A0 =  [[ 230 , 375 , 375 , 100 ,   0 , 685 , 215 ,   0 ,  50 ,   0 ],
       [ 330 , 405 , 419 , 175 ,  90 , 504 , 515 ,   0 , 240 , 105 ],
       [ 268 , 225 , 242 ,   0 ,  30 , 790 , 301 ,  44 , 100 ,   0 ],
       [ 595 , 380 , 638 , 275 ,  30 , 685 , 605 ,  88 , 100 , 160 ],
       [ 340 , 360 , 440 , 200 ,  30 , 755 , 475 ,  44 , 150 ,   0 ],
       [ 132 , 190 , 200 ,   0 ,   0 , 432 , 130 ,   0 ,   0 ,   0 ],
       [ 309 , 330 , 350 , 125 ,   0 , 612 , 474 ,   0 ,  50 ,  50 ],
       [ 365 , 400 , 330 , 150 ,  50 , 575 , 600 ,  44 , 150 , 110 ],
       [ 210 , 250 , 308 , 125 ,   0 , 720 , 256 ,   0 , 100 ,  50 ]]

u = [2029,2798,1998,3566,2794,1071,2305,2747,2015]
v = [2772,2910,3300,1150,240,5760,3526,220,950,495]

m = len(u)
n = len(v)

# --------  model ----------

D = np.sign(A0)

Dloga0 = D * np.log(A0+np.ones_like(D)-D)

A = cp.Variable((m,n),"A")

obj = cp.Minimize(cp.sum(cp.multiply(D,-cp.entr(A)) - cp.multiply(A,Dloga0)))
cons = [cp.sum(cp.multiply(D,A),axis=1)==u,
        cp.sum(cp.multiply(D,A),axis=0)==v,
        A >= 0]
prob = cp.Problem(obj,cons)
prob.solve(solver=cp.SCS,verbose=True,max_iters=200000)

Solving this tiny model is very difficult. It is using exponential cones and that is not very robustly implemented in the solvers. With SCS and a lot of iterations, we finally see:

101400| 7.74e-059.08e-057.16e-04 -1.56e+01 -1.56e+013.30e-111.14e+02
101500| 7.72e-059.10e-055.74e-04 -1.54e+01 -1.54e+013.30e-111.14e+02
101600| 7.69e-059.13e-055.30e-04 -1.51e+01 -1.51e+017.77e-111.14e+02
101700| 7.66e-059.17e-054.02e-04 -1.49e+01 -1.49e+015.64e-111.14e+02
101800| 7.62e-059.22e-053.96e-04 -1.46e+01 -1.46e+013.30e-111.14e+02
101900| 7.59e-059.28e-052.10e-04 -1.44e+01 -1.44e+013.30e-111.14e+02
101980| 7.56e-059.33e-053.35e-05 -1.42e+01 -1.42e+011.17e-111.14e+02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.14e+02s
    Lin-sys: nnz in L factor: 1089, avg solve time: 1.87e-05s
    Cones: avg projection time: 1.06e-03s
    Acceleration: avg step time: 2.62e-07s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.8795e-06, dist(y, K*) = 0.0000e+00, s'y/|s||y| = -5.3311e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 7.5554e-05
dual res:   |A'y + c|_2 / (1 + |c|_2) = 9.3272e-05
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 3.3548e-05
----------------------------------------------------------------------------
c'x = -14.2093, -b'y = -14.2103
============================================================================

The objective is in the neighborhood of what CONOPT found for the GAMS model (CONOPT required just a few iterations).

This model is not exactly a showcase for CVXPY.

Conclusion

CVXPY can be very convenient to model certain classes of models: convex and easy too write in matrix notation. For other models it is not very suited. We saw some examples where things become really hairy when using CVXPY while the underlying mathematical model is really quite simple. CVXPY is really a special purpose modeling tool and you may want to consider other tools when the model does not really fits CVXPY's matrix philosophy.

References

↧

Interesting constraint

November 10, 2019, 11:08 pm

≫ Next: MIP solver stopping criteria

≪ Previous: CVXPY matrix style modeling limits

In [1] the following problem is proposed:

I'm trying to solve a knapsack-style optimization problem with additional complexity.

Here is a simple example. I'm trying to select 5 items that maximize value. 2 of the items must be orange, one must be blue, one must be yellow, and one must be red. This is straightforward. However, I want to add a constraint that the selected yellow, red, and orange items can only have one shape in common with the selected blue item.

The example data looks like:

 item   color     shape  value
    A    blue    circle   0.454
    B  yellow    square   0.570
    C     red  triangle   0.789
    D     red    circle   0.718
    E     red    square   0.828
    F  orange    square   0.709
    G    blue    circle   0.696
    H  orange    square   0.285
    I  orange    square   0.698
    J  orange  triangle   0.861
    K    blue  triangle   0.658
    L  yellow    circle   0.819
    M    blue    square   0.352
    N  orange    circle   0.883
    O  yellow  triangle   0.755

Let's see if we can model this. First we slice and dice the data a bit to make the modeling a bit easier. Here is some derived data:

----     58 SET i  item

A,    B,    C,    D,    E,    F,    G,    H,    I,    J,    K,    L,    M,    N,    O


----     58 SET c  color

blue  ,    yellow,    red   ,    orange


----     58 SET s  shape

circle  ,    square  ,    triangle


----     58 SET ICS(i,c,s)

              circle      square    triangle

A.blue           YES
B.yellow                     YES
C.red                                    YES
D.red            YES
E.red                        YES
F.orange                     YES
G.blue           YES
H.orange                     YES
I.orange                     YES
J.orange                                 YES
K.blue                                   YES
L.yellow         YES
M.blue                       YES
N.orange         YES
O.yellow                                 YES


----     58 SET IC(i,c)

         blue      yellow         red      orange

A         YES
B                     YES
C                                 YES
D                                 YES
E                                 YES
F                                             YES
G         YES
H                                             YES
I                                             YES
J                                             YES
K         YES
L                     YES
M         YES
N                                             YES
O                     YES


----     58 SET CS(c,s)

            circle      square    triangle

blue           YES         YES         YES
yellow         YES         YES         YES
red            YES         YES         YES
orange         YES         YES         YES


----     58 PARAMETER value(i): value of item

A 0.454,    B 0.570,    C 0.789,    D 0.718,    E 0.828,    F 0.709,    G 0.696,    H 0.285,    I 0.698,    J 0.861
K 0.658,    L 0.819,    M 0.352,    N 0.883,    O 0.755


----     58 SET YRO(c): excludes blue

yellow,    red   ,    orange

Note that the set CS(c,s) is complete. However, I will assume that there is a possibility that this set has some missing entries. In other words, I will not assume that all combinations of colors and shapes exist in the data.

Let's introduce the following zero-one variables:\[\begin{align} & x_i = \begin{cases} 1 & \text{if item $i$ is selected}\\ 0 & \text{otherwise}\end{cases} \\ & y_{c,s} = \begin{cases} 1 & \text{if items with color $c$ and shape $s$ are selected}\\ 0 & \text{otherwise} \end{cases}\end{align}\]

My high-level model is:

High-level Model
\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i \| \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} = \max_{i\|\color{darkblue}{\mathit{ICS}}(i,c,s)} \color{darkred}x_i && \forall c,s\|\color{darkblue}{\mathit{CS}}(c,s)\\ & \color{darkred}y_{\color{darkblue}{\mathit{blue}},s} = 1 \Rightarrow \sum_{c\|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

High-level Model

\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i | \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} = \max_{i|\color{darkblue}{\mathit{ICS}}(i,c,s)} \color{darkred}x_i && \forall c,s|\color{darkblue}{\mathit{CS}}(c,s)\\ & \color{darkred}y_{\color{darkblue}{\mathit{blue}},s} = 1 \Rightarrow \sum_{c|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

The max constraint implements the definition of the $y_{c,s}$ variables: if any of the selected items has color/shape combination $(c,s)$ then $y_{c,s}=1$ (and else it stays zero). The implication constraint says: if we have a blue shape $s$, then there can be only one shape of type $s$ of other color. The model is a bit complicated because I wanted to be precise. No hand-waving. This helps when implementing it.

This is not yet a MIP model, but translation of the above model into a normal MIP is not too difficult.

Mixed Integer Programming Model
\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i \| \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} \ge \color{darkred}x_i && \forall i,c,s\|\color{darkblue}{\mathit{ICS}}(i,c,s)\\ & \sum_{c\|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 + \color{darkblue}M(1-\color{darkred}y_{\color{darkblue}{\mathit{blue}},s}) && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

Mixed Integer Programming Model

\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i | \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} \ge \color{darkred}x_i && \forall i,c,s|\color{darkblue}{\mathit{ICS}}(i,c,s)\\ & \sum_{c|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 + \color{darkblue}M(1-\color{darkred}y_{\color{darkblue}{\mathit{blue}},s}) && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

The max constraint has been replaced by a single inequality. This works in this case as we are only interested in $y$'s that are forced to be one. Those guys will limit blue shape constraint. The blue shape implication constraint itself is rewritten as a big-M inequality. A good value for $M$ is not very difficult to establish: the number of colors minus blue minus 1.

For comparison, let's first run the model without the complex "blue" shape constraints (the last two constraints). That gives:

----     90 VARIABLE z.L                   =        4.087  obj

----     90 VARIABLE x.L  select item

E 1.000,    G 1.000,    J 1.000,    L 1.000,    N 1.000


----     90 SET selected  

              circle      square    triangle

E.red                        YES
G.blue           YES
J.orange                                 YES
L.yellow         YES
N.orange         YES

We see that the blue shape (circle) is also selected as orange and yellow.

With the additional constraints, we see:

----     95 VARIABLE z.L                   =        4.049  obj

----     95 VARIABLE x.L  select item

E 1.000,    J 1.000,    K 1.000,    L 1.000,    N 1.000


----     95 VARIABLE y.L  color/shape combos in solution (bound)

            circle      square    triangle

blue                                 1.000
yellow       1.000
red                      1.000
orange       1.0001.000


----     95 SET selected  

              circle      square    triangle

E.red                        YES
J.orange                                 YES
K.blue                                   YES
L.yellow         YES
N.orange         YES

Now we see the blue shape is a triangle. We only have another triangle of color orange.

The complete GAMS model looks like:

$ontext

   I'm trying to solve a knapsack-style optimization problem with additional complexity.

   Here is a simple example. I'm trying to select 5 items that maximize value. 2 of the
   items must be orange, one must be blue, one must be yellow, and one must be red.
   This is straightforward. However, I want to add a constraint that the selected yellow,
   red, and orange items can only have one shape in common with the selected blue item.

$offtext

set
   i 'item'/A*O/
   c 'color'/blue,yellow,red,orange/
   s 'shape'/circle,square,triangle/
;

parameters
    data(i,c,s) 'value'/
         A . blue   . circle     0.454
         B . yellow . square     0.570
         C . red    . triangle   0.789
         D . red    . circle     0.718
         E . red    . square     0.828
         F . orange . square     0.709
         G . blue   . circle     0.696
         H . orange . square     0.285
         I . orange . square     0.698
         J . orange . triangle   0.861
         K . blue   . triangle   0.658
         L . yellow . circle     0.819
         M . blue   . square     0.352
         N . orange . circle     0.883
         O . yellow . triangle   0.755
         /
    NumItems 'number of items to select'/5/
    NumColor(c) 'required number of each color'/
         orange 2
         red    1
         blue   1
         yellow 1
        /
;

sets
   YRO(c) '(c): excludes blue'/yellow,red,orange/
   ICS(i,c,s) "(i,c,s)"
   IC(i,c)     "(i,c)"
CS(c,s)     "(c,s)"
;
parameter value(i) "(i): value of item";
ICS(i,c,s) = data(i,c,s);
IC(i,c) = sum(ICS(i,c,s),1);
CS(c,s) = sum(ICS(i,c,s),1);
value(i) = sum((c,s),data(i,c,s));
display i,c,s,ICS,IC,CS,value,YRO;

binaryvariable x(i) 'select item';
variable z 'obj';
binaryvariable y(c,s) 'color/shape combos in solution (bound)';

equations
   obj       'objective'
   count     'count number of selected items'
   countcolor(c) 'count selected items for each color'
   shapecol(i,c,s) 'bound on y(c,s)'
   impl(s)   'rewritten implication'
;
obj.. z =e= sum(i, value(i)*x(i));
count.. sum(i, x(i)) =e= numitems;
countcolor(c).. sum(IC(i,c), x(i)) =e= numcolor(c);
shapecol(ICS(i,c,s)).. y(c,s) =g= x(i);

scalar M;
M = card(s)-2;

impl(s).. sum(yro,y(yro,s)) =l= 1 + M*(1-y("blue",s));

set selected(i,c,s);

option optcr=0;
model m1 /obj,count,countcolor/;
solve m1 maximizing z using mip;
selected(i,c,s)$ICS(i,c,s) = x.l(i)>0.5;
display z.l,x.l,selected;

model m2 /all/;
solve m2 maximizing z using mip;
selected(i,c,s)$ICS(i,c,s) = x.l(i)>0.5;
display z.l,x.l,y.l,selected;

A second exercise would be to write this Python using PuLP.

References

How to construct a complex constraint in PuLP Python, https://stackoverflow.com/questions/58770011/how-to-construct-a-complex-constraint-in-pulp-python

↧

MIP solver stopping criteria

November 20, 2019, 5:40 am

≫ Next: SDP Model: imputing a covariance matrix

≪ Previous: Interesting constraint

For larger MIP models we often don't wait for a proven optimal solution. This just takes too long, and actually we are spending a lot of time in proving optimality without much return in terms of better solutions. There are a number of stopping criteria that are typically available:

Time Limit : stop after $x$ seconds (or hours)
Relative Gap: stop if gap between best possible bound and best found integer solution becomes less than $x\%$. Different solvers use different definitions (especially regarding the denominator).
Absolute Gap: similar to relative gap, but can be used when the relative gap cannot be computed (division by zero or small number).
Node Limit: stop on number of explored branch & bound nodes.
Iteration Limit: stop on number of Simplex iterations. This number can be huge.

I have ordered these stopping criteria in how useful they are (to me). Time limit is by far the most important: just tell the solver how long we are willing to wait. Stopping on an achieved gap is also useful. I don't remember ever using a node or iteration limit.

If you specify several limits, typically a solver will stop as soon as it hits any one of the specified limits. In other words: multiple stopping criteria are combined in an "or" fashion.

When stopping on a time limit, it is still important to inspect the final gap. A small gap gives us a guaranteed quality assurance about the solution.

For large models the tail is often very long, and we probably see hardly any movement: no new integer solutions are found and the best bound is moving very slowly (and moving less over time). So I really want to stop if there is not much hope for a better solution.

I would suggest another possible stopping criterion:

stop if the time since the last new (and improving) integer solution exceeds a time limit

If the time since that last new integer solution is large, we can infer that the probability of finding a better solution is small. We can also interpret this as resetting the clock after each new integer solution. I don't think any solver has this. Of course, for some solvers, we can implement this ourselves using some callback function.

↧

SDP Model: imputing a covariance matrix

November 22, 2019, 12:44 pm

≫ Next: Gurobi v9.0.

≪ Previous: MIP solver stopping criteria

Missing values are a well-known problem in statistics. The simplest approach is just to delete all data cases that have missing values. Another approach is to repair things by filling in reasonable values, This is called imputation. Imputation strategies can be very sophisticated (and complex).

Statistical tools have often direct support for representing missing values. E.g. the R language has NA (not available). GAMS also has NA. Python has no explicit support for missing values. By convention, the special floating point value NaN (Not a Number) is used to indicate missing values for floating point numbers. It is noted that the numpy library has some facilities to deal with missing data, but it is not really like R's NA [2].

In [1] a semi-definite programming (SDP) model is proposed to deal with a covariance matrix with some missing values by imputation. The constraint to be added is that the covariance matrix should remain positive-semi definite (PSD). A covariance matrix should be in theory PSD, but in practice it can happen it is not. The resulting model is stated as:

Impute missing values in Covariance Matrix (from [1])
\[\begin{align} \text{minimize}\>& 0\\ \text{subject to}\>&\color{darkred}{\Sigma}_{i,j} = \widetilde{\color{darkblue}{\Sigma}}_{i,j} && (i,j)\notin \color{darkblue}M\\ & \color{darkred}{\Sigma} \succeq 0 \end{align} \]

Here $\widetilde{\Sigma}$ is the covariance matrix with missing data in locations $(i,j)\in M$. The variable $\Sigma$ is the new covariance matrix with missing data filled in such that $\Sigma$ is positive semi-definite. This last condition is denoted by $\Sigma \succeq 0$. In this model there is no objective, as indicated by minimizing zero.

CVXPY implementation

There is no code provided for this model in [1]. So let me give it a try. CVXPY does not have good support for things like $\forall (i,j) \notin M$. I can see two approaches:

Expand the constraint into scalar form. Essentially, a DIY approach.
Use a binary data matrix $M_{i,j} \in \{0,1\}$ indicating the missing values and write \[(e\cdot e^T-M) \circ \Sigma = \widetilde{\Sigma}_0\] where $\circ$ is elementwise multiplication (a.k.a. Hadamard product), $e$ is a column vector of ones of appropriate size, and $\widetilde{\Sigma}_0$ is $\widetilde{\Sigma}$ but with NaN's replaced by zeros.

In addition, let's add a regularizing objective: minimize sum of squares of $\Sigma_{i,j}$.

The Python code for these two models is:

import numpy as np
import pandas as pd 
import cvxpy as cp

#------------ data ----------------

cov = np.array([
 [ 0.300457, -0.158889,  0.080241, -0.143750,  0.072844, -0.032968,  0.077836,  0.049272],
 [-0.158889,  0.399624,  np.nan,    0.109056,  0.082858, -0.045462, -0.124045, -0.132096],
 [ 0.080241,  np.nan,    np.nan,   -0.031902, -0.081455,  0.098212,  0.243131,  0.120404],
 [-0.143750,  0.109056, -0.031902,  0.386109, -0.058051,  0.060246,  0.082420,  0.125786],
 [ 0.072844,  0.082858, -0.081455, -0.058051,  np.nan,    np.nan,   -0.119530, -0.054881],
 [-0.032968, -0.045462,  0.098212,  0.060246,  np.nan,    0.400641,  0.051103,  0.007308],
 [ 0.077836, -0.124045,  0.243131,  0.082420, -0.119530,  0.051103,  0.543407,  0.121709],
 [ 0.049272, -0.132096,  0.120404,  0.125786, -0.054881,  0.007308,  0.121709,  0.481395]     
])
print("Covariance data with NaNs")
print(pd.DataFrame(cov))

M = 1*np.isnan(cov)
print("M (indicator for missing values)")
print(pd.DataFrame(M))

dim = np.shape(cov)
n = dim[0]

#----------- model 1 -----------------

Sigma = cp.Variable(dim, symmetric=True)

prob = cp.Problem(
    cp.Minimize(cp.sum_squares(Sigma)),
    [ Sigma[i,j] == cov[i,j] for i in range(n) for j in range(n) if M[i,j]==0 ] +
     [ Sigma >>  0 ]
    )
prob.solve(solver=cp.SCS,verbose=True)

print("Status:",prob.status)
print("Objective:",prob.value)
print(pd.DataFrame(Sigma.value))

#----------- model 2 -----------------

e = np.ones((n,1))
cov0 =  np.nan_to_num(cov,copy=True)

prob2 = cp.Problem(
#    cp.Minimize(cp.trace(Sigma.T@Sigma)),  <--- not recognized as convex
    cp.Minimize(cp.norm(Sigma,"fro")**2),
    [ cp.multiply(e@e.T - M,Sigma) == cov0,
      Sigma >>  0 ]
    )
prob2.solve(solver=cp.SCS,verbose=True)

print("Status:",prob2.status)
print("Objective:",prob2.value)
print(pd.DataFrame(Sigma.value))

Notes:

Model 1 has a (long) list of scalar constraints. The objective is \[\min\>\sum_{i,j} \Sigma_{i,j}^2\] Sorry for the possible confusion between the symbols for summation and covariance.
CVXPY uses the notation Sigma >> 0 to indicate $\Sigma \succeq 0$ (i.e. $\Sigma$ should be positive semi-definite).
We added the condition that $\Sigma$ should be symmetric in the variable statement. This seems to be needed. Without this, the solver may return a non-symmetric matrix. I suspect that in that case, the matrix $0.5(\Sigma+\Sigma^T)$ rather than $\Sigma$ itself is required to be positive definite.
Model 2 is an attempt to use matrix notation. The objective can be stated as \[\min\>\mathbf{tr}(\Sigma^T\Sigma)\] but that is not recognized as being convex. As alternative I used the Frobenius norm: \[||A||_F =\sqrt{ \sum_{i,j} a_{i,j}^2}\]
The function np.nan_to_num converts NaN values to zeros.
The function cp.multiply performs elementwise multiplication (as opposed to matrix multiplication).
I don't think we can easily only pass only the upper triangular part of the covariance matrix to the solver. For large problems this would save some effort (cpu time and memory).
In a traditional optimization model we would have just $|M|$ decision variables (corresponding to the missing values). Here, in the scalar model, we have $n^2$ variables and $n^2-|M|$ constraints.

The results are:

Covariance data with NaNs
0123456  \
00.300457 -0.1588890.080241 -0.1437500.072844 -0.0329680.077836
1 -0.1588890.399624       NaN  0.1090560.082858 -0.045462 -0.124045
20.080241       NaN       NaN -0.031902 -0.0814550.0982120.243131
3 -0.1437500.109056 -0.0319020.386109 -0.0580510.0602460.082420
40.0728440.082858 -0.081455 -0.058051       NaN       NaN -0.119530
5 -0.032968 -0.0454620.0982120.060246       NaN  0.4006410.051103
60.077836 -0.1240450.2431310.082420 -0.1195300.0511030.543407
70.049272 -0.1320960.1204040.125786 -0.0548810.0073080.121709

7
00.049272
1 -0.132096
20.120404
30.125786
4 -0.054881
50.007308
60.121709
70.481395
M (indicator for missing values)
01234567
000000000
100100000
201100000
300000000
400001100
500001000
600000000
700000000
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 160
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 37, constraints m = 160
Cones: primal zero / dual free vars: 58
 soc vars: 66, soc blks: 1
 sd vars: 36, sd blks: 1
Setup time: 9.69e-03s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 4.05e+197.57e+191.00e+00 -3.12e+191.92e+201.21e+201.53e-02
40| 2.74e-101.01e-094.51e-111.71e+001.71e+001.96e-171.88e-02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.89e-02s
 Lin-sys: nnz in L factor: 357, avg solve time: 1.54e-06s
 Cones: avg projection time: 1.52e-04s
 Acceleration: avg step time: 1.66e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.7898e-17, dist(y, K*) = 1.5753e-09, s'y/|s||y| = 3.7338e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.7439e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 1.0103e-09
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 4.5078e-11
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544257213233
0123456  \
00.300457 -0.1588890.080241 -0.1437500.072844 -0.0329680.077836
1 -0.1588890.399624 -0.0841960.1090560.082858 -0.045462 -0.124045
20.080241 -0.0841960.198446 -0.031902 -0.0814550.0982120.243131
3 -0.1437500.109056 -0.0319020.386109 -0.0580510.0602460.082420
40.0728440.082858 -0.081455 -0.0580510.135981 -0.041927 -0.119530
5 -0.032968 -0.0454620.0982120.060246 -0.0419270.4006410.051103
60.077836 -0.1240450.2431310.082420 -0.1195300.0511030.543407
70.049272 -0.1320960.1204040.125786 -0.0548810.0073080.121709

7
00.049272
1 -0.132096
20.120404
30.125786
4 -0.054881
50.007308
60.121709
70.481395
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 162
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 38, constraints m = 168
Cones: primal zero / dual free vars: 64
 soc vars: 68, soc blks: 2
 sd vars: 36, sd blks: 1
Setup time: 1.02e-02s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 3.67e+195.42e+191.00e+00 -2.45e+191.28e+201.04e+209.84e-03
40| 5.85e-101.47e-097.31e-101.71e+001.71e+008.09e-171.29e-02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.31e-02s
 Lin-sys: nnz in L factor: 368, avg solve time: 2.56e-06s
 Cones: avg projection time: 3.03e-05s
 Acceleration: avg step time: 2.45e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.4409e-16, dist(y, K*) = 1.5216e-09, s'y/|s||y| = 4.2866e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 5.8496e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 1.4729e-09
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 7.3074e-10
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544261472336
0123456  \
00.300457 -0.1588890.080241 -0.1437500.072844 -0.0329680.077836
1 -0.1588890.399624 -0.0841960.1090560.082858 -0.045462 -0.124045
20.080241 -0.0841960.198446 -0.031902 -0.0814550.0982120.243131
3 -0.1437500.109056 -0.0319020.386109 -0.0580510.0602460.082420
40.0728440.082858 -0.081455 -0.0580510.135981 -0.041927 -0.119530
5 -0.032968 -0.0454620.0982120.060246 -0.0419270.4006410.051103
60.077836 -0.1240450.2431310.082420 -0.1195300.0511030.543407
70.049272 -0.1320960.1204040.125786 -0.0548810.0073080.121709

7
00.049272
1 -0.132096
20.120404
30.125786
4 -0.054881
50.007308
60.121709
70.481395

As a sanity check we can confirm that the eigenvalues of the solution matrix are non-negative:

w,v = np.linalg.eig(Sigma.value)
print(w)

[9.46355900e-016.34465779e-012.35993549e-105.30366506e-02
1.69999646e-012.29670882e-014.36623248e-013.75907704e-01]

Practice

I don't think this is a practical way of dealing with missing values. First of all missing values in the original data will propagate in the covariance matrix. A single NA in the data leads to lots of NAs in the covariance matrix.

----     28 PARAMETER cov  effect of a single NA in the data

            j1          j2          j3          j4          j5          j6          j7          j8

j1    0.300457   -0.158889          NA   -0.1437500.072844   -0.0329680.0778360.049272
j2   -0.1588890.399624          NA    0.1090560.082858   -0.045462   -0.124045   -0.132096
j3          NA          NA          NA          NA          NA          NA          NA          NA
j4   -0.1437500.109056          NA    0.386109   -0.0580510.0602460.0824200.125786
j5    0.0728440.082858          NA   -0.0580510.354627   -0.129507   -0.119530   -0.054881
j6   -0.032968   -0.045462          NA    0.060246   -0.1295070.4006410.0511030.007308
j7    0.077836   -0.124045          NA    0.082420   -0.1195300.0511030.5434070.121709
j8    0.049272   -0.132096          NA    0.125786   -0.0548810.0073080.1217090.481395

This propagation is the result of applying the standard formula for the covariance: \[cov_{j,k} = \frac{1}{N-1} \sum_i (x_{i,j}-\mu_j)(x_{i,k}-\mu_k) \] This is of course difficult to fix in the covariance matrix. Just too much damage has been done.

A second problem with our SDP model is that we are not staying close to reasonable values for missing correlations. The model only looks at the PSD constraint.

Basically we need to look at the original data.

A simple remedy is just to throw away the record with the NA. If you have lots of data and relatively few NAs in the data, this is a reasonable approach. However there is a trick we can use. Instead of throwing a whole row of observations away in case we have an NA, we inspect pairs of columns $(j,k)$ individually. For the two columns $j$ and $k$ throw away the NAs in these columns and then calculate the covariance $cov_{j,k}$. Repeat for all combinations $(j,k)$ with $j \lt k$. R has this built-in;

> cov(a)
              [,1]         [,2]         [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261-0.0223359220.0240629150.026460677-0.00037359160.00213833970.0544727640-0.0008417817
[2,] -0.02233592220.3134442590.0361354130.027115454-0.00459559420.02866593340.05586108430.0222590384
[3,]  0.02406291480.0361354130.3084431820.0036633380.0014232064-0.01584312460.0308769925-0.0177244600
[4,]  0.02646067710.0271154540.0036633380.3228014480.00572219340.01750517220.01528044380.0034349411
[5,] -0.0003735916-0.0045955940.0014232060.0057221930.2920368646-0.00692135670.02271539190.0163823701
[6,]  0.00213833970.028665933-0.0158431250.017505172-0.00692135670.30959356030.00093592710.0506571760
[7,]  0.05447276400.0558610840.0308769930.0152804440.02271539190.00093592710.36350803110.0322080200
[8,] -0.00084178170.022259038-0.0177244600.0034349410.01638237010.05065717600.03220802000.2992700098
> a[2,3]=NA
> cov(a)
              [,1]         [,2] [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261-0.022335922NA0.026460677-0.00037359160.00213833970.0544727640-0.0008417817
[2,] -0.02233592220.313444259NA0.027115454-0.00459559420.02866593340.05586108430.0222590384
[3,]            NANANANANANANANA
[4,]  0.02646067710.027115454NA0.3228014480.00572219340.01750517220.01528044380.0034349411
[5,] -0.0003735916-0.004595594NA0.0057221930.2920368646-0.00692135670.02271539190.0163823701
[6,]  0.00213833970.028665933NA0.017505172-0.00692135670.30959356030.00093592710.0506571760
[7,]  0.05447276400.055861084NA0.0152804440.02271539190.00093592710.36350803110.0322080200
[8,] -0.00084178170.022259038NA0.0034349410.01638237010.05065717600.03220802000.2992700098
> cov(a,use="pairwise")
              [,1]         [,2]         [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261-0.0223359220.0240779690.026460677-0.00037359160.00213833970.0544727640-0.0008417817
[2,] -0.02233592220.3134442590.0368959960.027115454-0.00459559420.02866593340.05586108430.0222590384
[3,]  0.02407796930.0368959960.3115737540.0033773920.0013694087-0.01622316090.0310202082-0.0180617800
[4,]  0.02646067710.0271154540.0033773920.3228014480.00572219340.01750517220.01528044380.0034349411
[5,] -0.0003735916-0.0045955940.0013694090.0057221930.2920368646-0.00692135670.02271539190.0163823701
[6,]  0.00213833970.028665933-0.0162231610.017505172-0.00692135670.30959356030.00093592710.0506571760
[7,]  0.05447276400.0558610840.0310202080.0152804440.02271539190.00093592710.36350803110.0322080200
[8,] -0.00084178170.022259038-0.0180617800.0034349410.01638237010.05065717600.03220802000.2992700098
>

The disadvantage with pairwise covariances is that it is possible (even theoretically) that the final covariance matrix is not positive-semi definite. We can repair this with R's nearPD function. Essentially, this is performing an eigen-decomposition, replacing the negative eigenvalues by positive ones and then reassembling the covariance matrix (this is just matrix multiplications).

Conclusion

The model presented in [1] is interesting: it is not quite obvious how to implement it in CVXPY (and the code below the example in [1] is not directly related). However, it should be mentioned that better methods are available to address the underlying problem: how to handle missing values in a covariance matrix.

References

Semidefinite program, https://www.cvxpy.org/examples/basic/sdp.html
Missing Data Functionality in NumPy, https://docs.scipy.org/doc/numpy-1.10.1/neps/missing-data.html
Covariance matrix not positive definite in portfolio models, https://yetanothermathprogrammingconsultant.blogspot.com/2018/04/covariance-matrix-not-positive-definite.html

↧

Gurobi v9.0.

November 27, 2019, 12:12 pm

≫ Next: Opt Art

≪ Previous: SDP Model: imputing a covariance matrix

Now including elevator music!

Major new features:

Non-convex quadratic solver

Supports non-convexities both in objective and constraints
Not quite sure if MIQCP is supported (I assume it is, but I think this was not mentioned explicitly)
Cplex had already support for (some) non-convex quadratic models, so Gurobi is catching up here.

Performance improvements

MIP: 18% faster overall and 26% on difficult models (more than 100 seconds)
MIQP: 24% faster

References

What's New (for Gurobi), https://www.gurobi.com/products/gurobi-optimizer/whats-new-current-release/
2019-11 - INFORMS Seattle - Whats new in CPLEX 12.10, https://ibm.ent.box.com/s/81yeflq972r6gmhckobl403pwwpuih4a

↧

Opt Art

December 3, 2019, 8:37 am

≫ Next: Elementwise vs matrix multiplication

≪ Previous: Gurobi v9.0.

New book by TSP and Domino art creator Robert Bosch [1].

Content:

Optimization and the Visual Arts?
Truchet Tiles
Linear Optimization and the Lego Problem
The Linear Assignment problem and Cartoon Mosaics
Domino Mosaics
From the TSP to Continuous Line Drawings
TSP Art with Side Constraints
Knight's Tours
Labyrinth Design with Tiling and Pattern Matching
Mosaics with Side Constraints
Game-of-life Mosaics

Yes, it contains color pictures.

An example of a TSP drawing (from [2]):

25,000-city TSPortrait of George Dantzig [2,3]

Original fotograph

I guess more cities are needed to prevent his teeth from disappearing.

References

Robert Bosch, Opt Art, From Mathematical Optimization to Visual Design, 2019, https://press.princeton.edu/books/hardcover/9780691164069/opt-art
http://www2.oberlin.edu/math/faculty/bosch/tspart-page.html
https://en.wikipedia.org/wiki/George_Dantzig

↧

Elementwise vs matrix multiplication

December 7, 2019, 8:39 pm

≫ Next: Nonlinear variant of a knapsack problem

≪ Previous: Opt Art

Introduction

There are two often used methods to perform the multiplication of matrices (and vectors). The first is simply elementwise multiplication: \[c_{i,j} = a_{i,j} \cdot b_{i,j}\] In mathematics, this is sometimes referred to as the Hadamard product. The notation is typically: \[C = A \circ B\] but sometimes we see: \[C = A \odot B\] This product only works when $A$ and $B$ have the same shape. I.e. \[\begin{matrix} C&=&A&\circ&B\\ (m \times n) &&(m \times n)&&(m \times n)\end{matrix}\]

The standard matrix multiplication $C=A\cdot B$ or \[c_{i,j} = \sum_k a_{i,k} b_{k,j}\] has a different rule for conformance: \[\begin{matrix} C&=&A&\cdot&B\\ (m \times n) &&(m \times k)&&(k \times n)\end{matrix}\] Most of the time, the dot operator is dropped and we write $C = A B$.

In optimization modeling both forms are used a lot.

R, CVXR

R has two operators for multiplication:

* for elementwise multiplication
%*% for matrix multiplication

A vector is considered as a column vector (i.e. an $n$-vector is like a $n \times 1$ matrix). There is one special thing in R, as shown here:

> m <- 2
> n <- 3
> A <- matrix(c(1,2,3,4,5,6),m,n)
> A
     [,1] [,2] [,3]
[1,]    135
[2,]    246
> 
> x <- c(1,2,3)
>

#
# matrix multiplication
# 
> A %*% x
     [,1]
[1,]   22
[2,]   28
> 

#
# elementwise multiplication
#
> A * A
     [,1] [,2] [,3]
[1,]    1925
[2,]    41636

#
# but this also works
#
> A * x
     [,1] [,2] [,3]
[1,]    1910
[2,]    4418
>

The last multiplication is surprising as $A$ is a $2 \times 3$ matrix and $x$ is different in shape. Well, R may extend and recycle vectors to make them as large as needed. In this case $x$ is duplicated and then considered as a $2 \times 3$ matrix. More or less like:

> x
[1] 123
> matrix(x,2,3)
     [,1] [,2] [,3]
[1,]    132
[2,]    213

The modeling tool CVXR follows R notation and implements both * and %*% (for elementwise and matrix multiplication). However, CVXR is not implementing the extending and recycling of vectors that are too small.

Concept: recycling

Just to emphasize the concept of recycling. If an operation requires two vectors of the same length, R may make the shorter vector longer by recycling (duplicating). Here is an example:

> a <- 1:10
> b <- 1:2
> c <- a + b
> c
 [1]  2446688101012

In this example the vector a has elements 1 through 10. The vector b is too short, so it is recycled. When added to a, b is functionally equal to rep(c(1,2),5). When multiples of b do not exactly fit a, we effectively have a fractional duplication number. E.g. when we use b <- 1:3, we get a message:
Warning message:
In a + b : longer object length is not a multiple of shorter object length

This recycling trick is somewhat unique to R (I don't know about other languages doing this).

Example

In [2] the product: \[b_{i,j} = a_{i,j} \cdot x_i \] is implemented in R with the elementwise product:

> m <- 2
> n <- 3
> A <- matrix(c(1,2,3,4,5,6),m,n)
> A
     [,1] [,2] [,3]
[1,]    135
[2,]    246
> x <- c(1,2)
> B <- A*x
> B
     [,1] [,2] [,3]
[1,]    135
[2,]    4812

This again uses recycling of $x$. CVXR is not doing this automatically. We can see this here:

> library(CVXR)
> x <- Variable(m)
> B <- Variable(m,n)
> e <- rep(1,n)
> problem <- Problem(Minimize(0),
+                    list(x == c(1,2), 
+                         B == A * x ))
Error in sum_shapes(lapply(object@args, function(arg) { : 
  Incompatible dimensions

Here $x$ is now a CVXR variable. As the vector $x$ is not recycled, we end up with two different shapes and elementwise multiplication is refused. So how do we do something like this in CVXR?

The recycling operation:

> x
[1] 12
> matrix(x,m,n)
     [,1] [,2] [,3]
[1,]    111
[2,]    222

can be expressed in matrix notation as: \[x \cdot e^T\] where $e$ is a (column) vector of ones. This is sometimes called an outerproduct. I.e. we can write our assignment as \[B = A \circ (x \cdot e^T)\] In a CVXR model this can look like:

> library(CVXR)
> x <- Variable(m)
> B <- Variable(m,n)
> e <- rep(1,n)
> problem <- Problem(Minimize(0),
+                    list(x == c(1,2), 
+                         B == A * (x %*% t(e))))
> sol <- solve(problem)
> sol$status
[1] "optimal"
> sol$getValue(x)
     [,1]
[1,]    1
[2,]    2
> sol$getValue(B)
     [,1] [,2] [,3]
[1,]    135
[2,]    4812

Note: the constraint B == A * (x %*% t(e)) has both a matrix multiplication and a elementwise multiplication. This is rather funky.

Conclusion: if your matrix operations rely on recycling, you will need to rework things a bit to have this work correctly in CVXR. CVXR does not do recycling.

Python, CVXPY

In the previous section we saw that there are subtle differences between R's and CVXR's elementwise multiplication semantics.

Let's now look at Python and CVXPY.

Since Python 3.5 we have two multiplication operators:

* for elementwise multiplication
@ for matrix multiplication

CVXPY has different rules:

*, @ and matmul for matrix multiplication
multiply for elementwise multiplication

Example

import numpy as np
import cvxpy as cp

#
# In Python/nmpy * indicates elementwise multiplication
#
A = np.array([[1,2],[3,4]])
B = np.array([[1,1],[2,2]])
C = A*B
print(A)
# output:
# [[ 1 2]
#  [ 3 4]]
print(C)
# output:
# [[ 1 2]
#  [ 6 8]]

#
# In CVXPY * indicates matrix multiplication
#
A = cp.Variable((2,2))
C = cp.Variable((2,2))
prob = cp.Problem(cp.Minimize(0),
                  [A == [[1,2],[3,4]],
                   C == A*B])
prob.solve(verbose=True)
print(A.value)
# output:
# [[1. 3.]
#  [2. 4.]]
print(C.value)
# [[ 7.  7.]
#  [10. 10.]]

Here we see some differences between Python/Numpy and CVXPY. First the interpretation of Python lists (the values for A) is different. And secondly: the semantics of * are different. This may cause some confusion.

References

CVXR, Convex Optimization in R, https://cvxr.rbind.io/
CVXR Elementwise multiplication of matrix with vector, https://stackoverflow.com/questions/59224555/cvxr-elementwise-multiplication-of-matrix-with-vector

↧

Nonlinear variant of a knapsack problem

December 10, 2019, 9:41 am

≫ Next: CVXPY large memory allocation

≪ Previous: Elementwise vs matrix multiplication

In [1] a problem is posed:

Original Problem
\[\begin{align}\max & \sum_i \log_{100}(\color{darkblue}w_i) \cdot \color{darkred}x_i \\ & \frac{\sum_i \color{darkblue}w_i \color{darkred}x_i}{\sqrt{\color{darkred}k}} \le 10,000\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]

The data vector $w_i$ is assumed to integer valued with $w_i\ge 1$. Hence the logarithm can be evaluated without a problem. Also we can assume that the number of items can be around 1,000.

I don't think I have ever seen $\log_{100}()$ being used in a model. Most programming (and modeling) languages only support natural logarithms $\ln()$ and may be $\log_{10}()$. We can convert things by: \[\log_{100}(x) = \frac{\ln(x)}{\ln(100)}\] This means the objective can be written as \[\max \frac{1}{\ln(100)} \sum_i \ln(w_i) x_i\] Essentially, the $\log_{100}()$ function just adds a scaling factor. We can simplify the objective to \[\max \sum_i \ln(w_i)x_i\] (The objective value will be different, but the optimal solution will be the same).

The constraint can be rewritten as: \[\sum_i w_i x_i \le 10,000 \sqrt{k}\] If we ignore the all zero solution, we can assume $k\ge 1$. This bound will make sure the square root function is always differentiable. With this, we have a standard MINLP (Mixed Integer Nonlinear Programming) model.

MINLP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \sum_i \color{darkblue}w_i \color{darkred}x_i \le 10,000\sqrt{\color{darkred}k}\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred} k \ge 1 \end{align}\]

This model has only a single, well behaved, non-linearity. Literally, there is only one nonlinear nonzero element. In addition, the model is convex. So we don't expect any problems. For $w_i$, I generated 1000 random integer values from the interval $[1,10000]$.

MINLP Results
Solver	Obj	Time	Notes
Dicopt	1057.1355	0.6	3 NLP, 2 MIP subproblems
SBB	1056.9361	95	Node limit exceeded
Bonmin	1056.9472	600	Time limit exceeded
Bonmin	1057.1355	16	Option: bonmin.algorithm B-OA

The outer-approximation based algorithms (Dicopt, Bonmin with B-OA option) do much better than the branch & bound algorithms (SBB, default Bonmin). Even the global solvers do better:

Global Solver Results
Solver	Obj	Time
Baron	1057.1355	2
Antigone	1057.1355	1
Couenne	1057.1355	9

The problem can also be formulated as a convex MIQCP (Mixed Integer Quadratically Constrained Programming) model:

MIQCP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \color{darkred}y^2 \le \color{darkred}k\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}y = \frac{\sum_i \color{darkblue}w_i \color{darkred}x_i}{10,000} \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred} k \ge 1, \color{darkred}y \ge 0 \end{align}\]

Solvers like Cplex may convert this into a Cone problem (MISOCP).

Finally, we can also linearize this model by observing that $k\in \{1,\dots,1000\}$. So we don't really have a continuous function $f(k)=\sqrt{k}$, but rather only need function values at the integer points. We can exploit this by making this explicit. We can write: \[\begin{align} & k = \sum_i i\cdot \delta_i \\ & \sqrt{k} = \sum_i \sqrt{i}\cdot \delta_i \\ & \sum_i \delta_i = 1 \\ & \delta_i \in \{0,1\}\end{align}\] This is essentially a SOS1 (Special Ordered Set of Type 1) structure implementing a table lookup. The MIP model looks like:

MIP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \sum_i \color{darkblue}w_i \color{darkred} x_i \le 10,000 \color{darkred}q \\ & \color{darkred} k = \sum_i i \cdot \color{darkred}\delta_i \\ & \color{darkred} q = \sum_i \sqrt{i} \cdot \color{darkred}\delta_i \\ & \sum_i \color{darkred} \delta_i = 1 \\ & \color{darkred}x_i, \color{darkred}\delta_i \in \{0,1\} \\ & \color{darkred} k, \color{darkred} q \ge 1 \end{align}\]

MIP Model

\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \sum_i \color{darkblue}w_i \color{darkred} x_i \le 10,000 \color{darkred}q \\ & \color{darkred} k = \sum_i i \cdot \color{darkred}\delta_i \\ & \color{darkred} q = \sum_i \sqrt{i} \cdot \color{darkred}\delta_i \\ & \sum_i \color{darkred} \delta_i = 1 \\ & \color{darkred}x_i, \color{darkred}\delta_i \in \{0,1\} \\ & \color{darkred} k, \color{darkred} q \ge 1 \end{align}\]

When we solve this problem we see:

MIQCP and MIP Results
Model	Solver	Obj	Time
MIQCP	Cplex	1057.1355	0.3
MIP	Cplex	1057.1355	0.6

Conclusion: the question in the original post was: how to solve this problem? Here we proposed three different models: an MINLP, MIQCP and MIP model. All these models can be solved quickly. It is noted that the quadratic and linear models are not approximations: they give the same solution as the original MINLP model. Pure non-linear branch & bound methods are having a bit of a problem with the MINLP model, but Outer-Approximation works very well.

References

How do we solve a variant of the knapsack problem in which the capacity of the knapsack keeps increasing as we add more items into the knapsack?, https://stackoverflow.com/questions/59242370/how-do-we-solve-a-variant-of-the-knapsack-problem-in-which-the-capacity-of-the-k

↧

CVXPY large memory allocation

December 16, 2019, 6:39 am

≫ Next: A facility location problem

≪ Previous: Nonlinear variant of a knapsack problem

In [1] a simple regression problem was stated and solved with CVXPY. The number of observations is very large ($200,000$), while the number of coefficients to estimate is moderate ($100$). The first formulation is simply:

Regression I
\[\begin{align}\min_{\color{darkred}w}\>& \|\|\color{darkblue}y-\color{darkblue}X\color{darkred}w\|\|^2_2 \end{align}\]

In Python code, this can look like:

importcvxpyascp
importnumpyasnp

N = 200000
M = 100

X = np.random.normal(0, 1, size=(N, M))
y = np.random.normal(0, 1, size=N)

w = cp.Variable(M)
prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
prob.solve()
print("status:",prob.status)
print("obj:",prob.value)

Unfortunately, this is giving the following runtime error:

[ec2-user@ip-172-30-0-79 etc]$ python3 ls0.py 
Traceback (most recent call last):
  File "ls0.py", line 12, in <module>
    prob.solve()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve
    return solve_func(self, *args, **kwargs)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve
    self._construct_chains(solver=solver, gp=gp)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains
    raise e
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 501, in _construct_chains
    self._intermediate_chain.apply(self)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/chain.py", line 65, in apply
    problem, inv = r.apply(problem)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/qp2symbolic_qp.py", line 60, in apply
    return super(Qp2SymbolicQp, self).apply(problem)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 58, in apply
    problem.objective)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 96, in canonicalize_tree
    canon_arg, c = self.canonicalize_tree(arg)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 99, in canonicalize_tree
    canon_expr, c = self.canonicalize_expr(expr, canon_args)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 108, in canonicalize_expr
    return self.canon_methods[type(expr)](expr, args)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/atom_canonicalizers/quad_over_lin_canon.py", line 29, in quad_over_lin_canon
    return SymbolicQuadForm(t, eye(affine_expr.size)/y, expr), [affine_expr == t]
  File "/home/ec2-user/.local/lib/python3.7/site-packages/numpy/lib/twodim_base.py", line 201, in eye
    m = zeros((N, M), dtype=dtype, order=order)
MemoryError: Unable to allocate array with shape (200000, 200000) and data type float64
[ec2-user@ip-172-30-0-79 etc]$

This is interesting: CVXPY seems to allocate a $200,000\times 200,000$ matrix here. Actually it is an identity matrix! This can be seen from the name eye. This seems to be related to the reformulation into a QP model.

To get this a bit more under control, lets make the size a bit smaller and use a memory profiler. At least we get some information about the memory usage.

[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 
-----------------------------------------------------------------
           OSQP v0.6.0  -  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 20100, constraints m = 20000
          nnz(P) + nnz(A) = 2040000
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter   objective    pri res    dua res    rho        time
10.0000e+004.38e+002.98e+041.00e-011.03e+00s
501.9804e+042.13e-094.01e-071.00e-011.65e+00s
plsh   1.9804e+044.01e-151.03e-12   --------   2.45e+00s

status:               solved
solution polish:      successful
number of iterations: 50
optimal objective:    19804.0226
run time:             2.45e+00s
optimal rho estimate: 1.35e-02

status: optimal
obj: 19804.022648294507
Filename: ls1.py

Line #    Mem usage    Increment   Line Contents
================================================
450.188 MiB   50.188 MiB   @profile
5                             def f():
6                             #    N = 200000
750.188 MiB    0.000 MiB       N = 20000
850.188 MiB    0.000 MiB       M = 100
9
1050.188 MiB    0.000 MiB       np.random.seed(123)
1165.477 MiB   15.289 MiB       X = np.random.normal(0, 1, size=(N, M))
1265.707 MiB    0.230 MiB       y = np.random.normal(0, 1, size=N)
13
1465.707 MiB    0.000 MiB       w = cp.Variable(M)
1565.707 MiB    0.000 MiB       prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
16412.086 MiB  346.379 MiB       prob.solve(verbose=True)
17412.086 MiB    0.000 MiB       print("status:",prob.status)
18412.086 MiB    0.000 MiB       print("obj:",prob.value)

We see indeed this is solved as QP model (it is solved by the QP solver OSQP), and in the solve statement we have this spike in memory usage.

The allocation is measured in MiBs or Mebibytes. A mebibyte is equal to $2^{20}=1,048,576$ bytes.

It is probably a bad idea to allocate this identity matrix like this as a fully allocated matrix. The better approach would be not to use this matrix at all. A second best approach would be to make this identity matrix a sparse matrix.

Approach 1: use a conic programming solver

This is an easy fix. Just use a solver like ECOS:

[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 

ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
0  +0.000e+00  -2.601e-05  +2e+045e-017e-061e+001e+04    ---    ---    12  - |  -  - 
1  -9.312e-01  +2.050e-02  +3e+022e-021e-071e+002e+020.98291e-04122 |  00
2  -5.192e+00  +2.947e+00  +1e+027e-035e-088e+007e+010.61867e-02233 |  00
3  +6.518e+02  +8.866e+02  +3e+026e-024e-072e+021e+020.18009e-01222 |  00
4  +9.646e+02  +9.849e+02  +4e+001e-038e-092e+012e+000.98261e-04776 |  00
5  +1.130e+03  +1.144e+03  +7e-012e-031e-091e+014e-010.88362e-02232 |  00
6  +1.493e+03  +1.521e+03  +3e-011e-024e-103e+012e-010.98901e-01111 |  00
7  +2.057e+03  +2.080e+03  +5e-028e-039e-112e+013e-020.92715e-02121 |  00
8  +2.134e+03  +2.152e+03  +4e-024e-034e-112e+012e-020.81362e-01222 |  00
9  +2.244e+03  +2.256e+03  +3e-024e-022e-111e+011e-020.71071e-01111 |  00
10  +1.995e+03  +2.004e+03  +4e-036e-022e-119e+004e-030.00058e-01111 |  00
11  +6.609e+02  +6.615e+02  +8e+008e-032e-126e-014e+000.98908e-01000 |  00
12  +6.293e+02  +6.423e+02  +1e+015e-032e-121e+018e+000.53253e-01111 |  00
13  +5.873e+02  +7.017e+02  +3e+012e-023e-121e+022e+010.46755e-01233 |  00
14  +7.549e+02  +1.687e+03  +4e+007e-034e-129e+023e+000.98901e-01111 |  00
15  +1.292e+03  +1.300e+03  +2e+002e-033e-128e+009e-010.88132e-01322 |  00
16  +2.141e+03  +2.149e+03  +2e-012e-022e-128e+009e-020.98905e-03111 |  00
17  +2.819e+03  +2.827e+03  +2e-024e-024e-128e+001e-020.90523e-02111 |  00
18  +2.757e+03  +2.774e+03  +5e-022e-022e-122e+012e-020.97464e-01122 |  00
19  +2.615e+03  +2.631e+03  +1e-028e-022e-122e+018e-030.13502e-01111 |  00
20  +2.002e+03  +2.009e+03  +2e-036e-023e-137e+003e-030.00299e-01111 |  00
21  +2.884e+03  +2.959e+03  +2e-038e-023e-128e+019e-030.70807e-01000 |  00
22  +2.861e+03  +2.934e+03  +2e-039e-023e-127e+019e-030.00079e-01111 |  00
23  +1.847e+03  +1.857e+03  +3e-025e-022e-131e+012e-020.00261e+00000 |  00
24  +6.545e+02  +6.548e+02  +1e+019e-032e-123e-016e+000.98906e-01000 |  00
25  +7.003e+02  +7.142e+02  +2e+014e-032e-121e+011e+010.60782e-01110 |  00
26  +6.232e+02  +7.771e+02  +5e+012e-023e-122e+023e+010.57064e-01233 |  00
27  +5.916e+02  +1.396e+03  +8e+003e-032e-128e+026e+000.98901e-01111 |  00
28  +1.468e+03  +1.658e+03  +1e+002e-033e-122e+027e-010.98909e-02111 |  00
29  +2.046e+03  +2.089e+03  +4e-011e-024e-134e+012e-010.98903e-02111 |  00
30  +1.720e+03  +1.744e+03  +7e-025e-023e-142e+016e-020.18732e-01111 |  00
31  +6.708e+02  +6.722e+02  +2e+017e-032e-121e+001e+010.98906e-01000 |  00
32  +5.985e+02  +6.158e+02  +5e+013e-039e-132e+013e+010.92603e-01111 |  00
33  +7.045e+02  +9.047e+02  +1e+022e-022e-122e+026e+010.65915e-01123 |  00
34  +4.246e+02  +8.139e+02  +3e+012e-031e-134e+023e+010.98902e-01111 |  00
35  +1.140e+03  +1.436e+03  +7e-013e-037e-133e+023e+000.96001e-02111 |  00
36  +1.597e+03  +1.707e+03  +1e+004e-033e-141e+029e-010.98902e-01222 |  00
37  +2.184e+03  +2.226e+03  +1e-015e-031e-124e+011e-010.98906e-02122 |  00
38  +2.540e+03  +2.556e+03  +2e-014e-032e-122e+011e-010.98902e-01112 |  00
39  +3.055e+03  +3.065e+03  +5e-026e-032e-121e+013e-020.98908e-02122 |  00
40  +3.450e+03  +3.458e+03  +4e-025e-032e-128e+002e-020.98901e-01112 |  00
41  +3.932e+03  +3.939e+03  +1e-027e-032e-126e+008e-030.98901e-01112 |  00
42  +4.347e+03  +4.352e+03  +1e-026e-032e-126e+005e-030.98901e-01112 |  00
43  +4.796e+03  +4.801e+03  +5e-038e-032e-125e+003e-030.98901e-01112 |  00
44  +5.206e+03  +5.211e+03  +3e-037e-032e-124e+002e-030.98901e-01112 |  00
45  +5.633e+03  +5.637e+03  +2e-031e-022e-124e+001e-030.98901e-01112 |  00
46  +6.028e+03  +6.032e+03  +1e-039e-032e-124e+008e-040.98901e-01112 |  00
47  +6.430e+03  +6.433e+03  +9e-041e-022e-123e+005e-040.98901e-01112 |  00
48  +6.811e+03  +6.814e+03  +7e-041e-022e-123e+004e-040.98901e-01112 |  00
49  +7.188e+03  +7.191e+03  +5e-041e-022e-123e+003e-040.98901e-01112 |  00
50  +7.551e+03  +7.553e+03  +3e-041e-022e-122e+002e-040.98901e-01112 |  00
51  +7.908e+03  +7.910e+03  +2e-042e-022e-122e+001e-040.98902e-01112 |  00
52  +8.251e+03  +8.253e+03  +2e-042e-022e-122e+001e-040.98902e-01112 |  00
53  +8.586e+03  +8.588e+03  +1e-042e-022e-122e+008e-050.98902e-01112 |  00
54  +8.911e+03  +8.913e+03  +1e-042e-022e-122e+006e-050.98902e-01112 |  00
55  +9.228e+03  +9.230e+03  +8e-052e-022e-122e+004e-050.98902e-01112 |  00
56  +9.534e+03  +9.535e+03  +6e-052e-022e-121e+004e-050.98902e-01112 |  00
57  +9.779e+03  +9.780e+03  +5e-051e-012e-121e+003e-050.83572e-01111 |  00
58  +9.766e+03  +9.768e+03  +2e-053e-012e-121e+001e-050.00019e-01111 |  00
59  +3.518e+03  +3.517e+03  -6e+007e+009e-13  -5e-01  -3e+000.00001e+00111 |  00
Unreliable search direction detected, recovering best iterate (58) and stopping.

Close to PRIMAL INFEASIBLE (within feastol=5.7e-06).
Runtime: 20.262543 seconds.

status: infeasible_inaccurate
obj: None
Filename: ls1.py

Line #    Mem usage    Increment   Line Contents
================================================
450.312 MiB   50.312 MiB   @profile
5                             def f():
6                             #    N = 200000
750.312 MiB    0.000 MiB       N = 20000
850.312 MiB    0.000 MiB       M = 100
9
1050.312 MiB    0.000 MiB       np.random.seed(123)
1165.578 MiB   15.266 MiB       X = np.random.normal(0, 1, size=(N, M))
1265.836 MiB    0.258 MiB       y = np.random.normal(0, 1, size=N)
13
1465.836 MiB    0.000 MiB       w = cp.Variable(M)
1565.836 MiB    0.000 MiB       prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
1676.824 MiB   10.988 MiB       prob.solve(solver=cp.ECOS,verbose=True)
1776.824 MiB    0.000 MiB       print("status:",prob.status)
1876.824 MiB    0.000 MiB       print("obj:",prob.value)

The results are mixed. We certainly don't see this crazy memory allocation anymore. It is now a very small 10 MiB. However the solver is not numerically stable enough to solve this problem.

There is another solver that comes with CVXPY that we can try:

[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 
----------------------------------------------------------------------------
SCS v2.1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2000002
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 0, rho_x = 1.00e-03
Variables n = 101, constraints m = 20002
Cones:soc vars: 20002, soc blks: 1
WARN: aa_init returned NULL, no acceleration applied.
Setup time: 7.44e-01s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 1.22e+187.87e+181.00e+00 -9.64e+184.27e+214.18e+211.92e-02
100| 8.68e+177.02e+175.09e-011.71e+215.25e+213.54e+216.67e-01
200| 8.04e+175.33e+174.23e-012.16e+215.33e+213.16e+211.31e+00
300| 7.37e+174.39e+173.72e-012.39e+215.22e+212.83e+211.96e+00
400| 6.73e+173.74e+173.35e-012.51e+215.04e+212.53e+212.59e+00
500| 6.14e+173.23e+173.05e-012.57e+214.83e+212.26e+213.22e+00
600| 5.60e+172.82e+172.80e-012.59e+214.60e+212.01e+213.85e+00
700| 5.11e+172.48e+172.57e-012.58e+214.37e+211.79e+214.50e+00
800| 4.67e+172.20e+172.36e-012.56e+214.14e+211.58e+215.15e+00
900| 4.27e+171.96e+172.17e-012.52e+213.92e+211.40e+215.79e+00
1000| 3.91e+171.74e+171.99e-012.47e+213.70e+211.23e+216.46e+00
1100| 3.58e+171.56e+171.81e-012.42e+213.49e+211.07e+217.09e+00
1200| 3.28e+171.40e+171.64e-012.36e+213.29e+219.29e+207.72e+00
1300| 3.01e+171.26e+171.48e-012.30e+213.10e+217.98e+208.36e+00
1400| 2.77e+171.14e+171.31e-012.24e+212.92e+216.78e+209.00e+00
1500| 2.55e+171.03e+171.15e-012.18e+212.75e+215.67e+209.65e+00
1600| 2.35e+179.31e+169.88e-022.12e+212.59e+214.65e+201.03e+01
1700| 2.17e+178.44e+168.25e-022.06e+212.43e+213.71e+201.09e+01
1800| 2.00e+177.67e+166.62e-022.01e+212.29e+212.84e+201.16e+01
1900| 1.85e+176.98e+164.97e-021.95e+212.15e+212.03e+201.23e+01
2000| 1.72e+176.37e+163.30e-021.89e+212.02e+211.29e+201.29e+01
2100| 1.59e+175.81e+161.60e-021.84e+211.90e+215.98e+191.36e+01
2200| 8.80e-031.43e-011.60e-051.20e+041.20e+041.94e-161.42e+01
2300| 6.45e-031.37e-011.62e-051.23e+041.23e+046.83e-171.49e+01
2400| 6.25e-031.34e-011.40e-051.26e+041.26e+042.15e-161.55e+01
2500| 6.06e-031.30e-011.21e-051.28e+041.28e+042.25e-161.62e+01
2600| 5.89e-031.27e-011.04e-051.30e+041.30e+047.78e-171.68e+01
2700| 5.71e-031.24e-018.98e-061.33e+041.33e+048.14e-171.75e+01
2800| 5.55e-031.21e-017.70e-061.35e+041.35e+048.46e-171.82e+01
2900| 5.39e-031.18e-016.56e-061.37e+041.37e+048.78e-171.88e+01
3000| 5.24e-031.15e-015.57e-061.39e+041.39e+049.04e-171.95e+01
3100| 5.09e-031.12e-014.68e-061.41e+041.41e+049.40e-172.01e+01
3200| 4.94e-031.09e-013.90e-061.42e+041.42e+042.91e-162.08e+01
3300| 4.80e-031.07e-013.21e-061.44e+041.44e+049.96e-172.14e+01
3400| 4.67e-031.04e-012.60e-061.46e+041.46e+041.03e-162.21e+01
3500| 4.54e-031.01e-012.05e-061.48e+041.48e+043.18e-162.27e+01
3600| 4.41e-039.87e-021.57e-061.49e+041.49e+041.09e-162.33e+01
3700| 4.29e-039.62e-021.14e-061.51e+041.51e+041.12e-162.40e+01
3800| 4.16e-039.37e-027.64e-071.52e+041.52e+041.14e-162.47e+01
3900| 4.05e-039.12e-024.28e-071.54e+041.54e+041.17e-162.53e+01
4000| 3.93e-038.88e-021.30e-071.55e+041.55e+041.20e-162.60e+01
4100| 3.82e-038.65e-021.34e-071.57e+041.57e+041.23e-162.66e+01
4200| 3.72e-038.42e-023.67e-071.58e+041.58e+041.25e-162.73e+01
4300| 3.61e-038.20e-025.72e-071.59e+041.59e+041.28e-162.80e+01
4400| 3.51e-037.98e-027.52e-071.60e+041.60e+043.92e-162.86e+01
4500| 3.41e-037.77e-029.11e-071.62e+041.62e+041.33e-162.93e+01
4600| 3.32e-037.56e-021.05e-061.63e+041.63e+041.36e-163.00e+01
4700| 3.22e-037.35e-021.17e-061.64e+041.64e+044.15e-163.07e+01
4800| 3.13e-037.15e-021.27e-061.65e+041.65e+041.41e-163.13e+01
4900| 3.04e-036.96e-021.36e-061.66e+041.66e+044.30e-163.20e+01
5000| 2.96e-036.77e-021.44e-061.67e+041.67e+044.37e-163.27e+01
----------------------------------------------------------------------------
Status: Solved/Inaccurate
Hit max_iters, solution may be inaccurate
Timing: Solve time: 3.27e+01s
Lin-sys: nnz in L factor: 2025055, avg solve time: 5.82e-03s
Cones: avg projection time: 3.02e-05s
Acceleration: avg step time: 7.51e-07s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 0.0000e+00, dist(y, K*) = 3.6380e-12, s'y/|s||y| = -6.2180e-16
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.9568e-03
dual res:   |A'y + c|_2 / (1 + |c|_2) = 6.7689e-02
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 1.4408e-06
----------------------------------------------------------------------------
c'x = 16701.6612, -b'y = 16701.6131
============================================================================
status: optimal_inaccurate
obj: 16701.661202598487
Filename: ls1.py

Line #    Mem usage    Increment   Line Contents
================================================
450.438 MiB   50.438 MiB   @profile
5                             def f():
6                             #    N = 200000
750.438 MiB    0.000 MiB       N = 20000
850.438 MiB    0.000 MiB       M = 100
9
1050.438 MiB    0.000 MiB       np.random.seed(123)
1165.699 MiB   15.262 MiB       X = np.random.normal(0, 1, size=(N, M))
1265.957 MiB    0.258 MiB       y = np.random.normal(0, 1, size=N)
13
1465.957 MiB    0.000 MiB       w = cp.Variable(M)
1565.957 MiB    0.000 MiB       prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
1676.953 MiB   10.996 MiB       prob.solve(solver=cp.SCS,verbose=True)
1776.953 MiB    0.000 MiB       print("status:",prob.status)
1876.953 MiB    0.000 MiB       print("obj:",prob.value)

That solver also has problems. In this case we ran out of iterations.

Approach 2: use a different formulation

We can use a different formulation:

Regression II
\[\begin{align}\min_{\color{darkred}w,\color{darkred}r}\>& \|\|\color{darkred}r\|\|^2_2 \\ & \color{darkred}r = \color{darkblue}y - \color{darkblue}X\color{darkred}w \end{align}\]

Here we add a bunch of (free) variables $r$ for the residuals, and also a bunch of linear constraints. The payback is that we have a much simpler quadratic objective. In addition it is 100% positive (semi-) definite. A reasonable implementation can look like:

importcvxpyascp
importnumpyasnp

# N = 200000
N = 20000
M = 100

X = np.random.normal(0, 1, size=(N,M))
y = np.random.normal(0, 1, size=(N,1))

w = cp.Variable((M,1))
r = cp.Variable((N,1))
prob = cp.Problem(cp.Minimize(r.T @ r), [r == y - X @ w])
prob.solve(solver=cp.OSQP,verbose=True)
print("status:",prob.status)
print("obj:",prob.value)

This should work, but it doesn't. We see:

[ec2-user@ip-172-30-0-79 etc]$ python3 ls2a.py 
Traceback (most recent call last):
  File "ls2a.py", line 14, in <module>
    prob.solve(solver=cp.OSQP,verbose=True)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve
    return solve_func(self, *args, **kwargs)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve
    self._construct_chains(solver=solver, gp=gp)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains
    raise e
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 499, in _construct_chains
    construct_intermediate_chain(self, candidate_solvers, gp=gp)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/solvers/intermediate_chain.py", line 70, in construct_intermediate_chain
    raise DCPError("Problem does not follow DCP rules. Specifically:\n" + append)
cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically:
The objective is not DCP. Its following subexpressions are not:
var1.T * var1

Bummer. This is a perfectly convex problem!

Some alternative formulations to generate a QP do not help:

cp.sum_squares(r) yields a large allocation when forming a QP
sum([r[i]**2 for i in range(N)] is very slow and very memory hungry
quad_form with an identity matrix is not efficient

This did not work out so well. I am not really able to solve relatively large, but easy least squares problems using standard formulations.

Conclusion

CVXPY thinks $r^Tr = \sum_i r_i^2$ is not convex.
We cannot always generate large problems for the OSQP QP solver due to large dense memory allocations in CVXPY. I probably would consider this a bug: it is never a good idea to physically create and allocate a large dense identity matrix in any somewhat serious code.
CVXPY can generate large instances for the conic solvers ECOS and SCS. But they have some troubles solving this problem.
A commercial solver like Cplex has no problem with this; 2 iterations, 0.2 seconds.
We can conclude that CVXPY (and its collection of standard solvers) is better for smaller problems.

References

Least squares problem run out of memory, https://stackoverflow.com/questions/59315300/least-squares-problem-run-out-of-memory-in-cvxpy

↧