In [1] the following problem is posed:
One approach to attack this problem is:
This step is just data manipulation.
We are closer to the row sums as expected.
One way to model a permutation of a column vector \(a\) inside an optimization model is to find a square permutation matrix \(P\) [2] with
\[\begin{align}&y_i = \sum_k P_{i,k}a_k\\ &\sum_k P_{i,k}=1&\forall i \\&\sum_i P_{i,k} = 1&\forall k\\& P_{i,k}\in\{0,1\}\end{align}\]
A permutation matrix has exactly one one in each row and in each column. The rest of the elements are zero. One could say a permutation matrix is a (row or column) permuted identity matrix. The constraints can also be interpreted as assignment constraints (related to the assignment problem).
For this I used a relative measure: minimize the sum of squared relative deviations:
\[\min \sum_{i,j} \left( \frac{d_{i,j}}{\mu_j} \right)^2 \]
where \(\mu_j = (L_j +U_j)/2\). This is to ensure that the deviations are distributed according to the average magnitude of a column. This is to make sure column 1 gets larger deviations than column 4.
I believe we need to do steps 2 and 3 simultaneously to find a global optimal solution. Here is an attempt:
This is a convex MIQP model, so we can solve this with high-performance solvers like Cplex and Gurobi.
Problem
I am trying to generate a matrix of numbers with 7 rows and 4 columns. Each row must sum to 100 and each column must have an even spread (if permitted) between a min and max range (specified below).
Goal:
C1 C2 C3 C4 sum range
1 low 100^
2..|
3..|
4..|
5..|
6..|
7 high _
c1_high =98
c1_low =75
c2_high =15
c2_low =6
c3_high =8
c3_low =2
c4_low =0.05
c4_high =0.5In addition to this, i need the spread of each row to be as linear as possible, though a line fitted to the data with a second order polynomial would suffice (with an r^2 value of >0.98).
Note: this description seems to hint the values in the columns must be ordered from \(L_j\) to \(U_j\). That is actually not the case: we don't assume any ordering.
Solution Outline
This problem somewhat resembles a matrix balancing problem. Let's see how we can model this.One approach to attack this problem is:
Step 1: Create data with perfect spread
It is easy to generate data with perfect spread wile ignoring that the row sums are 100. We just start at \(L_j\) and increase a by fixed step size:---- 40 PARAMETER L low
c1 75.000, c2 6.000, c3 2.000, c4 0.050
---- 40 PARAMETER U high
c1 98.000, c2 15.000, c3 8.000, c4 0.500
---- 40 PARAMETER init initial matrix
c1 c2 c3 c4 rowsum
r1 75.0006.0002.0000.05083.050
r2 78.8337.5003.0000.12589.458
r3 82.6679.0004.0000.20095.867
r4 86.50010.5005.0000.275102.275
r5 90.33312.0006.0000.350108.683
r6 94.16713.5007.0000.425115.092
r7 98.00015.0008.0000.500121.500
This step is just data manipulation.
Step 2: Permute values in each column
The second step is to permute the values within a column to achieve a solution that has a close match to the row sums. This gives:---- 75 VARIABLE y.L after permutation
c1 c2 c3 c4 rowsum
r1 86.50010.5005.0000.275102.275
r2 94.1677.5003.0000.125104.792
r3 98.0006.0002.0000.050106.050
r4 90.3339.0004.0000.200103.533
r5 82.66712.0006.0000.350101.017
r6 75.00015.0008.0000.50098.500
r7 78.83313.5007.0000.42599.758
We are closer to the row sums as expected.
One way to model a permutation of a column vector \(a\) inside an optimization model is to find a square permutation matrix \(P\) [2] with
\[\begin{align}&y_i = \sum_k P_{i,k}a_k\\ &\sum_k P_{i,k}=1&\forall i \\&\sum_i P_{i,k} = 1&\forall k\\& P_{i,k}\in\{0,1\}\end{align}\]
A permutation matrix has exactly one one in each row and in each column. The rest of the elements are zero. One could say a permutation matrix is a (row or column) permuted identity matrix. The constraints can also be interpreted as assignment constraints (related to the assignment problem).
Step 3: Add minimal relative deviations
The final step is to find small deviations from \(y\) such that the row sums are exactly 100.---- 75 VARIABLE x.L final values
c1 c2 c3 c4 rowsum
r1 84.26510.4674.9930.275100.000
r2 89.4607.4312.9840.125100.000
r3 92.0575.9121.9800.050100.000
r4 86.8638.9493.9880.200100.000
r5 81.66811.9855.9970.350100.000
r6 76.47315.0228.0050.500100.000
r7 79.07113.5037.0010.425100.000
---- 75 VARIABLE d.L deviations
c1 c2 c3 c4
r1 -2.235-0.033-0.007-2.25855E-5
r2 -4.707-0.069-0.016-4.75702E-5
r3 -5.943-0.088-0.020-6.00626E-5
r4 -3.471-0.051-0.012-3.50779E-5
r5 -0.999-0.015-0.003-1.00932E-5
r6 1.4730.0220.0051.489155E-5
r7 0.2370.0037.931220E-42.399194E-6
For this I used a relative measure: minimize the sum of squared relative deviations:
\[\min \sum_{i,j} \left( \frac{d_{i,j}}{\mu_j} \right)^2 \]
where \(\mu_j = (L_j +U_j)/2\). This is to ensure that the deviations are distributed according to the average magnitude of a column. This is to make sure column 1 gets larger deviations than column 4.
A complete model
I believe we need to do steps 2 and 3 simultaneously to find a global optimal solution. Here is an attempt:
This is a convex MIQP model, so we can solve this with high-performance solvers like Cplex and Gurobi.
Notes
- We could split the model in two phases (step 2 and step 3 in the above description). I am not sure how this would impact the optimal solution.
- This model can be linearized by using a different objective: minimize the sum of the absolute values of the (relative) deviations. You would end up with a linear MIP model.
- Instead of dividing by the average column value \(\mu_j\) we can divide by the "real" value \(y_{i,j}\). Unfortunately, that would make this a general MINLP model.
References
- Algorithm for generation of number matrix with specified boundaries, https://stackoverflow.com/questions/47754210/algorithm-for-generation-of-number-matrix-with-specified-boundaries
- Permutation Matrix, https://en.wikipedia.org/wiki/Permutation_matrix