RAS Method
RAS is a well-known algorithm to perform matrix balancing:
\[\begin{align} \min \>& \mathit{dist}(A,A^0)\\ & \sum_j a_{i,j} = u_i &&\forall i\\&\sum_i a_{i,j} = v_j &&\forall j \\ & a_{i,j} \gt 0 \end{align} \]
i.e. find a matrix \(A\) that is as close as possible to a given matrix \(A^0\) while obeying row and column sum constraints. This method is often used to "clean-up" data sets that contain data from different sources. The goal is to achieve a consistent data set as is often required by subsequent modelling efforts.
The RAS method looks like:
RAS Algorithm | ||
---|---|---|
Step 1 | Initialization | \[A := A^0\] |
Step 2 | Row Scaling | \[\begin{align}&\rho_i := \frac{u_i}{\displaystyle\sum_j a_{i,j}}\\ &a_{i,j} := \rho_i a_{i,j}\end{align}\] |
Step 3 | Column Scaling | \[\begin{align}&\sigma_j := \frac{v_j}{\displaystyle\sum_i a_{i,j}}\\ &a_{i,j} := \sigma_j a_{i,j}\end{align}\] |
Step 4 | If not converged go to step 2 |
This algorithm works very well and is widely used. However it does not allow to add constraints on \(A\). For many applications it is important to be able to add additional constraints on \(A\). The entropy method will allow you to do that.
Entropy Method
The following optimization model gives identical results as the RAS method:
\[\begin{align} \min \>& \sum_{i,j} a_{i,j} \log \left( \frac{a_{i,j}}{a^0_{i,j}} \right) \\ & \sum_j a_{i,j} = u_i &&\forall i\\&\sum_i a_{i,j} = v_j &&\forall j \\ & a_{i,j} \gt 0 \end{align} \]
Notes:
- The objective can be rewritten as: \(\sum_{i,j} [a_{i,j} \log(a_{i,j}) - a_{i,j} \log(a^0_{i,j}) ]\). The second term is linear.
- Usually we set a lower bound: \(a_{i,j} \ge \varepsilon\) for some \(\varepsilon>0\).
- If we really want to allow \(a_{i,j}=0\) we can replace \(\log(a_{i,j})\) by \(\log(a_{i,j}+\varepsilon ) \).
- We can add extra constraints to this model. An example is when we deal with world trade flows: the sum of all net exports (i.e. export \(-\) import) should be zero.
- This is a convex problem but it leads to many superbasic variables. This means NLP solvers like MINOS, SNOPT and CONOPT are struggling a bit: they like models with relative few superbasic variables. Interior point solvers like IPOPT and KNITRO solve this problem more quickly.
- Mosek versions from before 2018 can also solve this as a general convex NLP. From 2018 we need to reformulate the problem (see below) [1].
In addition we may have some zeroes in the matrix \(A^0\) which we want to be kept zero. In other words, we want to preserve the sparsity pattern. The \(A^0\) matrix elements are called priors in the world of Entropy estimation. In GAMS such a model can look like:
variables A(i,j),z;
equations
objective
rowsum(i)
colsum(j)
;
objective.. z =e= sum((i,j)$A0(i,j), A(i,j)*log(A(i,j)/A0(i,j)));
rowsum(i).. sum(j$A0(i,j), A(i,j)) =e= A0(i,'rowTotal');
colsum(j).. sum(i$A0(i,j), A(i,j)) =e= A0('colTotal',j);
A.L(i,j) = A0(i,j);
A.lo(i,j)$A0(i,j) = 0.0001;
model m1 /all/;
solve m1 minimizing z using nlp;
Exponential Cone
Conic programming has become an important way to model and solve certain convex problems. Some solvers support an Exponential Cone. This is a constraint of the form:
\[ x \ge y \exp\left(\frac{z}{y}\right) \]
with \(y > 0\). With a little bit of effort we can show that \(\min (x \log x) \) can be implemented using this artifact. The first thing to do is to write \(\min z\) subject to \(z \ge x \log x\). Now we can do:
\[\begin{align} & z \ge x \log x \Rightarrow \\ & -z \le x \log\left( \frac{1}{x} \right) \Rightarrow \\ & - \frac{z}{x} \le \log\left( \frac{1}{x} \right) \Rightarrow \\ & \exp\left( - \frac{z}{x} \right) \le \frac{1}{x} \Rightarrow \\ & x \exp\left( \frac{-z}{x} \right) \le 1 \end{align}\]
and we have our Exponential Cone. To verify this, we can solve the convex NLP with the cone simulated with a nonlinear constraint:
\[\begin{align} \min \>& \sum_{i,j} z_{i,j} - \sum_{i,j} a_{i,j} \log (a^0_{i,j}) \\ & \sum_j a_{i,j} = u_i &&\forall i\\&\sum_i a_{i,j} = v_j &&\forall j \\ & a_{i,j} \exp \left( \frac{-z_{i,j}}{a_{i,j}} \right) \le 1 && \forall i,j\\ & a_{i,j} \gt 0\end{align} \]
This is only to see if our derivation is correct. This formulation has not much practical use. With this model we added extra variables \(z_{i,j}\) and extra equations. We also moved the non-linearities from the objective into the constraints. For a general purpose NLP solver this is bad news. We can see that with a small test model:
Model | Solver | Iterations | Objective | Evaluation Errors |
---|---|---|---|---|
Entropy | Conopt | 15 | -15.7687 | 0 |
Simulated Exponential Cone | Conopt | 47 | -15.7687 | 4 |
Entropy | IPOPT | 5 | -15.7687 | 0 |
Simulated Exponential Cone | IPOPT | 15 | -15.7687 | 0 |
Entropy | Mosek 8 | 8 | -15.7687 | 0 |
Simulated Exponential Cone | Mosek 8 | 13 | -15.7687 | 0 |
Note that iteration counts are not comparable between different solvers. Obviously and as expected this conic reformulation makes no sense when using a general NLP solver. Iteration counts go up and we seem to have some overflows in the evaluation of the nonlinear constraint function (or gradient). The expression \(\exp(z/y)\) is inherently dangerous. Note that conic solvers handle this very differently inside.
Conic Modeling
Mosek will drop support for general convex NLPs [1]. It is suggested to use a conic formulation. So lets try out some other solvers that support exponential cones.
To model this for use with a convex solver, one would typically use a built-in function to express the entropy function. In CVXPY [2] we can use entr which is defined as \(-x\log(x)\). Here is a try to solve this small problem with CVXPY:
To model this for use with a convex solver, one would typically use a built-in function to express the entropy function. In CVXPY [2] we can use entr which is defined as \(-x\log(x)\). Here is a try to solve this small problem with CVXPY:
from cvxpy import *
from numpy import sign, reshape
A0 = [[ 230 , 375 , 375 , 100 , 0 , 685 , 215 , 0 , 50 , 0 ],
[ 330 , 405 , 419 , 175 , 90 , 504 , 515 , 0 , 240 , 105 ],
[ 268 , 225 , 242 , 0 , 30 , 790 , 301 , 44 , 100 , 0 ],
[ 595 , 380 , 638 , 275 , 30 , 685 , 605 , 88 , 100 , 160 ],
[ 340 , 360 , 440 , 200 , 30 , 755 , 475 , 44 , 150 , 0 ],
[ 132 , 190 , 200 , 0 , 0 , 432 , 130 , 0 , 0 , 0 ],
[ 309 , 330 , 350 , 125 , 0 , 612 , 474 , 0 , 50 , 50 ],
[ 365 , 400 , 330 , 150 , 50 , 575 , 600 , 44 , 150 , 110 ],
[ 210 , 250 , 308 , 125 , 0 , 720 , 256 , 0 , 100 , 50 ]]
u = [2029,2798,1998,3566,2794,1071,2305,2747,2015]
v = [2772,2910,3300,1150,240,5760,3526,220,950,495]
m = len(u)
n = len(v)
# we need to exclude cases with A0[i,j]=0 in obj and constraints
indic = [[sign(A0[i][j]) for j in range(n)] for i in range(m)]
a = Variable(n,m)
lb = a >= mul_elemwise(0.0001,indic)
ub = a <= mul_elemwise(10000.0,indic)
colsum = sum_entries(a, axis=0) == reshape(u,(1,m))
rowsum = sum_entries(a, axis=1) == v
cons = [lb,ub,colsum,rowsum]
obj = Minimize(sum_entries(mul_elemwise(indic,-entr(a)-mul_elemwise(log(A0),a))))
prob = Problem(obj, cons)
#prob.solve(solver=SCS, verbose=True)
prob.solve(solver=ECOS, verbose=True)
This may not be the best way to implement things: I was struggling a bit with handling the zeroes in the matrix \(A^0\). Instead of fixing \(a_{i,j}=0\), I probably should not even generate those variables. In the GAMS model above this was easy. If the solver has good presolve capabilities this does not matter as much. Any idea how to model this properly in CVXPY let me know.
The results are mixed. The output of ECOS is:
ECOS 2.0.4 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -7.401e+05 +5e+021e+007e-011e+001e+00 --- --- 00 - | - -
1 +1.202e+03 -7.382e+05 +5e+011e+007e-018e+001e-010.88751e-02111 | 00
2 +5.098e+03 -7.301e+05 +1e+011e+008e-014e+013e-020.81904e-02111 | 00
3 +8.759e+03 -7.219e+05 +6e+001e+008e-017e+011e-020.50131e-01111 | 13
4 +2.579e+04 -6.784e+05 +2e+001e+008e-012e+024e-030.71513e-02111 | 01
5 +4.993e+04 -6.066e+05 +8e-019e-017e-014e+022e-030.57894e-02111 | 12
6 +1.094e+05 -3.718e+05 +2e-017e-015e-019e+025e-040.72952e-03111 | 01
7 +9.292e+04 -2.105e+05 +1e-014e-014e-017e+022e-040.57385e-02111 | 12
8 +4.286e+04 -6.659e+04 +3e-021e-011e-013e+028e-050.70201e-02111 | 01
9 +1.456e+04 -1.859e+04 +1e-024e-024e-028e+012e-050.71872e-02111 | 01
10 +4.757e+03 -5.169e+03 +3e-031e-021e-022e+017e-060.71361e-02100 | 01
11 +1.895e+03 -2.133e+03 +1e-035e-035e-039e+003e-060.62665e-02200 | 12
12 +3.885e+02 -5.152e+02 +3e-041e-031e-032e+006e-070.77363e-03200 | 01
13 +1.566e+02 -2.156e+02 +1e-045e-044e-049e-013e-070.62665e-02200 | 22
14 +2.161e+01 -6.152e+01 +3e-051e-041e-042e-016e-080.78331e-02100 | 11
15 -3.765e-02 -3.416e+01 +1e-055e-054e-058e-022e-080.62665e-02100 | 22
16 -1.240e+01 -2.001e+01 +2e-061e-059e-062e-025e-090.78331e-02100 | 11
17 -1.437e+01 -1.747e+01 +9e-074e-064e-068e-032e-090.62665e-02000 | 22
18 -1.546e+01 -1.616e+01 +2e-079e-078e-072e-035e-100.78339e-03100 | 11
19 -1.564e+01 -1.593e+01 +8e-084e-073e-077e-042e-100.62665e-02100 | 22
20 -1.574e+01 -1.580e+01 +2e-088e-087e-082e-044e-110.78339e-03100 | 11
21 -1.576e+01 -1.578e+01 +8e-093e-083e-087e-052e-110.62665e-02000 | 22
22 -1.577e+01 -1.577e+01 +2e-098e-097e-091e-054e-120.78339e-03100 | 11
OPTIMAL (within feastol=7.7e-09, reltol=1.1e-10, abstol=1.7e-09).
Runtime: 0.012709 seconds.
Obj = -15.76630664400414
We get the correct objective although with more iterations than IPOPT needs to solve the model with entropy objective directly as an NLP. With the SCS solver, I see more problems:
----------------------------------------------------------------------------
SCS v1.2.6 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012-2016
----------------------------------------------------------------------------
Lin-sys: sparse-indirect, nnz in A = 540, CG tol ~ 1/iter^(2.00)
eps = 1.00e-03, alpha = 1.50, max_iters = 2500, normalize = 1, scale = 1.00
Variables n = 180, constraints m = 469
Cones: primal zero / dual free vars: 19
linear vars: 180
exp vars: 270, dual exp vars: 0
Setup time: 2.41e-03s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 1.#Je+00 1.#Je+00 -1.#Je+00 -1.#Je+00 -1.#Je+00 1.#Je+00 6.79e-03
100| 5.58e-03 3.02e-03 1.44e-01 -6.58e+04 -4.92e+04 2.61e-10 1.07e-01
200| 3.31e-03 2.28e-03 1.82e-01 -5.56e+04 -3.85e+04 3.70e-10 2.36e-01
300| 2.52e-03 1.97e-03 2.11e-01 -4.69e+04 -3.06e+04 2.61e-10 3.53e-01
400| 1.92e-03 1.92e-03 2.43e-01 -3.94e+04 -2.40e+04 3.61e-10 4.70e-01
500| 1.55e-03 1.80e-03 2.76e-01 -3.27e+04 -1.86e+04 2.44e-10 5.84e-01
600| 1.25e-03 1.64e-03 3.14e-01 -2.68e+04 -1.40e+04 3.22e-10 6.98e-01
700| 1.10e-03 1.44e-03 3.54e-01 -2.16e+04 -1.03e+04 0.00e+00 8.07e-01
800| 1.14e-03 1.23e-03 3.84e-01 -1.71e+04 -7.63e+03 2.55e-10 9.14e-01
900| 8.94e-04 1.07e-03 4.16e-01 -1.36e+04 -5.59e+03 3.05e-10 1.02e+00
1000| 6.46e-04 8.99e-04 4.34e-01 -1.07e+04 -4.22e+03 3.54e-10 1.12e+00
1100| 5.35e-04 7.22e-04 4.47e-01 -8.39e+03 -3.20e+03 3.99e-10 1.21e+00
1200| 5.01e-04 5.90e-04 4.39e-01 -6.62e+03 -2.58e+03 2.19e-10 1.32e+00
1300| 3.90e-04 4.84e-04 4.21e-01 -5.29e+03 -2.16e+03 2.36e-10 1.43e+00
1400| 2.60e-04 4.10e-04 3.83e-01 -4.31e+03 -1.92e+03 2.49e-10 1.54e+00
1500| 3.38e-04 3.31e-04 3.38e-01 -3.52e+03 -1.74e+03 2.60e-10 1.64e+00
1600| 1.96e-04 3.09e-04 2.90e-01 -3.00e+03 -1.65e+03 2.69e-10 1.74e+00
1700| 2.13e-04 2.43e-04 2.32e-01 -2.61e+03 -1.62e+03 2.75e-10 1.83e+00
1800| 2.05e-04 2.24e-04 1.81e-01 -2.28e+03 -1.59e+03 2.80e-10 1.92e+00
1900| 1.49e-04 2.19e-04 1.42e-01 -2.07e+03 -1.56e+03 2.83e-10 2.01e+00
2000| 1.44e-04 2.12e-04 1.13e-01 -1.92e+03 -1.53e+03 2.86e-10 2.12e+00
2100| 1.41e-04 2.09e-04 8.66e-02 -1.79e+03 -1.51e+03 2.88e-10 2.22e+00
2200| 1.32e-04 2.06e-04 6.30e-02 -1.71e+03 -1.50e+03 2.89e-10 2.31e+00
2300| 1.35e-04 2.04e-04 4.97e-02 -1.64e+03 -1.49e+03 2.90e-10 2.40e+00
2400| 1.30e-04 2.05e-04 3.88e-02 -1.59e+03 -1.47e+03 2.91e-10 2.49e+00
2500| 1.26e-04 2.03e-04 3.02e-02 -1.56e+03 -1.47e+03 2.92e-10 2.57e+00
----------------------------------------------------------------------------
Status: Solved/Inaccurate
Hit max_iters, solution may be inaccurate
Timing: Solve time: 2.57e+00s
Lin-sys: avg # CG iterations: 3.88, avg solve time: 5.10e-05s
Cones: avg projection time: 9.44e-04s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 1.0027e-03, dist(y, K*) = 0.0000e+00, s'y/|s||y| = -5.5744e-11
|Ax + s - b|_2 / (1 + |b|_2) = 1.2581e-04
|A'y + c|_2 / (1 + |c|_2) = 2.0276e-04
|c'x + b'y| / (1 + |c'x| + |b'y|) = 3.0169e-02
----------------------------------------------------------------------------
c'x = -1560.6383, -b'y = -1469.2022
============================================================================
Obj = -1560.638313679461
This does not look very good.
We are working with very large data sets on similar (but more complicated) models (size indication: number of cells to estimate varies between 1e4 and 1e6). These models are currently using Mosek using their general convex NLP functionality. I am curious how this will work out with these new exponential cones. As a backup we always can resort to using NLP solvers like IPOPT and Knitro.
References
- Mosek, Version 9 Roadmap, https://themosekblog.blogspot.de/2018/01/version-9-roadmap_31.html
- http://www.cvxpy.org/en/latest/