In [1] a problem that is simple at first sight. Looking a bit further there are some interesting angles.
The example data set is:
The idea is that we can apply weights \(w_j\) to calculate a final score for each row:\[F_i = \sum_j w_j a_{i,j}\] Weights obey the usual constraints: \(w_j \in [0,1]\) and \(\sum_j w_j=1\). The goal is to find optimal weights such that the number of records with final score in the bucket \([0.9, 1]\) is maximized.
Looking at the data, \(w=(0,1,0)\) is a good choice. This gives us two \(F_i\)'s in our bucket, Let's see if we can formalize this with a model.
Counting is done with binary variables. So let's define\[\delta_i = \begin{cases} 1 & \text{if $L\le F_i\le U$}\\ 0 & \text{otherwise}\end{cases}\] I used \(L\) and (\U\) to indicate the bucket,
A first model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_i \delta_i \\ & F_i = \sum_j w_j a_{i,j}\\& \sum_i w_i = 1\\ & L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\\ & \delta_i \in \{0,1\} \\ & w_i \in [0,1]\end{align}}\] Here \(M\) is a large enough constant.
The sandwich equation models the implication \[\delta_i=1 \Rightarrow L\le F_i \le U\] The objective will make sure that \(L\le F_i \le U \Rightarrow \delta_i=1 \) holds for the optimal solution.
The data seems to suggest \(0 \le a_{i,j} \le 1\). Which means \(0 \le F_i \le 1\). This follows from \[\min_j a_{i,j} \le F_i \le \min_j a_{i,j}\] We can also assume \(0 \le L \le U \le 1\). We can conclude the largest difference possible between \(F_i\) and \(L\) (and \(F_i\) and \(U\)) is one. So, in this case an obvious value for \(M\) is \(M=1\). More generally, if we first make sure \(U \le \max\{a_{i,j}\}\) and \(L \ge \min\{a_{i,j}\}\) by the preprocessing step: \[\begin{align} & U := \min\{U, \max\{a_{i,j}\}\}\\ & L := \max\{L, \min\{a_{i,j}\}\}\end{align}\] we have: \[M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\] We can do even better by using an \(M_i\) for each record instead of a single, global \(M\). We will get back to this later.
We can optimize things further by observing that not always both "branches" of \[L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\] are needed. With our small example we have \(U=1\), but we know already that \(F_i \le 1\). So in this case we only need to worry about \(L - M(1-\delta_i) \le F_i\).
We can generalize this as follows. First calculate bounds \(\ell_i \le F_i\le u_i\): \[\begin{align} & \ell_i = \min_j a_{i,j}\\ & u_i = \max_j a_{i,j}\end{align}\] Then generate constraints: \[\begin{align} & L - M(1-\delta_i) \le F_i && \forall i | L > \ell_i\\ & F_i \le U+M(1-\delta_i) && \forall i | U < u_i\end{align}\]
The first three records in the example data set are really no candidates for having \(F_i \in [0.9, 1]\). The reason is that for those records we have \(u_i \lt L\). In general we can skip all records with \(u_i \lt L\) or \(\ell_i \gt U\). These records will never have \(\delta_i=1\).
I extended the \(a\) matrix with following columns:
For a small data set this all does not make much difference, but for large ones, we make the model much smaller.
The optimal weights for this small data set are not unique. Obviously I get the same optimal number of selected records:
I did not pay too much attention to my big-M's. I just used the calculation \(M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\), which yielded:
We can use a tailored \(M^L_i, M^U_i\) for each inequality. For the lower bounds our equation looks like \[L - M^L_i(1-\delta_i) \le F_i\] This means we have \(M^L_i \ge L - F_i\). We have bounds on \(F_i\), (we stored these in \(a_{i,up}\) and \(a_{i,lo}\)). So we can set \(M^L_i =L - a_{i,lo}\). This gives:
Similar for the U inequality. For our small example this is not so important, but for larger instances with a wider range of data this may be essential.
we see that we can remove a significant number of records and big-M constraints. The model solves instantaneous and shows:
A simple model becomes not that simple once we start "optimizing" it. Unfortunately this is typical for large MIP models.
The example data set is:
---- 27 PARAMETER a
j1 j2 j3
i1 0.8700.7300.410
i2 0.8200.7300.850
i3 0.8200.3700.850
i4 0.5800.9500.420
i5 1.0001.0000.900
The idea is that we can apply weights \(w_j\) to calculate a final score for each row:\[F_i = \sum_j w_j a_{i,j}\] Weights obey the usual constraints: \(w_j \in [0,1]\) and \(\sum_j w_j=1\). The goal is to find optimal weights such that the number of records with final score in the bucket \([0.9, 1]\) is maximized.
Looking at the data, \(w=(0,1,0)\) is a good choice. This gives us two \(F_i\)'s in our bucket, Let's see if we can formalize this with a model.
MIP Model
Counting is done with binary variables. So let's define\[\delta_i = \begin{cases} 1 & \text{if $L\le F_i\le U$}\\ 0 & \text{otherwise}\end{cases}\] I used \(L\) and (\U\) to indicate the bucket,
A first model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_i \delta_i \\ & F_i = \sum_j w_j a_{i,j}\\& \sum_i w_i = 1\\ & L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\\ & \delta_i \in \{0,1\} \\ & w_i \in [0,1]\end{align}}\] Here \(M\) is a large enough constant.
The sandwich equation models the implication \[\delta_i=1 \Rightarrow L\le F_i \le U\] The objective will make sure that \(L\le F_i \le U \Rightarrow \delta_i=1 \) holds for the optimal solution.
Big-M
The data seems to suggest \(0 \le a_{i,j} \le 1\). Which means \(0 \le F_i \le 1\). This follows from \[\min_j a_{i,j} \le F_i \le \min_j a_{i,j}\] We can also assume \(0 \le L \le U \le 1\). We can conclude the largest difference possible between \(F_i\) and \(L\) (and \(F_i\) and \(U\)) is one. So, in this case an obvious value for \(M\) is \(M=1\). More generally, if we first make sure \(U \le \max\{a_{i,j}\}\) and \(L \ge \min\{a_{i,j}\}\) by the preprocessing step: \[\begin{align} & U := \min\{U, \max\{a_{i,j}\}\}\\ & L := \max\{L, \min\{a_{i,j}\}\}\end{align}\] we have: \[M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\] We can do even better by using an \(M_i\) for each record instead of a single, global \(M\). We will get back to this later.
More preprocessing
We can optimize things further by observing that not always both "branches" of \[L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\] are needed. With our small example we have \(U=1\), but we know already that \(F_i \le 1\). So in this case we only need to worry about \(L - M(1-\delta_i) \le F_i\).
We can generalize this as follows. First calculate bounds \(\ell_i \le F_i\le u_i\): \[\begin{align} & \ell_i = \min_j a_{i,j}\\ & u_i = \max_j a_{i,j}\end{align}\] Then generate constraints: \[\begin{align} & L - M(1-\delta_i) \le F_i && \forall i | L > \ell_i\\ & F_i \le U+M(1-\delta_i) && \forall i | U < u_i\end{align}\]
Even more preprocessing
The first three records in the example data set are really no candidates for having \(F_i \in [0.9, 1]\). The reason is that for those records we have \(u_i \lt L\). In general we can skip all records with \(u_i \lt L\) or \(\ell_i \gt U\). These records will never have \(\delta_i=1\).
Combining things
I extended the \(a\) matrix with following columns:
- lo: \(\ell_i=\min_j a_{i,j}\).
- up: \(u_i = \max_j a_{i,j}\).
- cand: boolean, indicates if this row is a candidate. We check: \(u_i \ge L\) and \(\ell_i \le U\).
- chkL: boolean, indicates if we need to check the left "branch". This is equal to one if \(\ell_i\lt L\).
- chkR: boolean, indicates if we need to check the right "branch". This is equal to one if \(u_i\gt L\). No records have this equal to 1.
---- 41 PARAMETER a
j1 j2 j3 lo up cand chkL
i1 0.8700.7300.4100.4100.8701.000
i2 0.8200.7300.8500.7300.8501.000
i3 0.8200.3700.8500.3700.8501.000
i4 0.5800.9500.4200.4200.9501.0001.000
i5 1.0001.0000.9000.9001.0001.000
For a small data set this all does not make much difference, but for large ones, we make the model much smaller.
Results
The optimal weights for this small data set are not unique. Obviously I get the same optimal number of selected records:
---- 71 VARIABLE w.L weights
j1 0.135, j2 0.865
---- 71 VARIABLE f.L final scores
i1 0.749, i2 0.742, i3 0.431, i4 0.900, i5 1.000
---- 71 VARIABLE delta.L selected
i4 1.000, i5 1.000
---- 71 VARIABLE z.L = 2.000 objective
Big-M revisited
---- 46 PARAMETER M = 0.630 big-M
We can use a tailored \(M^L_i, M^U_i\) for each inequality. For the lower bounds our equation looks like \[L - M^L_i(1-\delta_i) \le F_i\] This means we have \(M^L_i \ge L - F_i\). We have bounds on \(F_i\), (we stored these in \(a_{i,up}\) and \(a_{i,lo}\)). So we can set \(M^L_i =L - a_{i,lo}\). This gives:
---- 78 PARAMETER ML
i4 0.480
Similar for the U inequality. For our small example this is not so important, but for larger instances with a wider range of data this may be essential.
Larger problem
For a larger random problem:
---- 43 PARAMETER a data
j1 j2 j3 lo up cand chkL chkU
i1 0.7370.8660.8930.7370.8931.000
i2 0.4340.6540.6060.4340.6541.000
i3 0.7210.9870.6480.6480.9871.0001.000
i4 0.8000.6180.8690.6180.8691.000
i5 0.6680.7031.1770.6681.1771.0001.0001.000
i6 0.6560.5400.5250.5250.6561.000
i7 0.8641.0370.5690.5691.0371.0001.0001.000
i8 0.9421.0030.6550.6551.0031.0001.0001.000
i9 0.6000.8010.6010.6000.8011.000
i10 0.6190.9331.1140.6191.1141.0001.0001.000
i11 1.2430.6750.7560.6751.2431.0001.0001.000
i12 0.6080.7780.7470.6080.7781.000
i13 0.6950.5931.1960.5931.1961.0001.0001.000
i14 0.9651.0010.6500.6501.0011.0001.0001.000
i15 0.9511.0400.9690.9511.0401.0001.000
i16 0.5141.2200.9350.5141.2201.0001.0001.000
i17 0.8341.1301.0540.8341.1301.0001.0001.000
i18 1.1610.5500.4810.4811.1611.0001.0001.000
i19 0.6380.6400.6910.6380.6911.000
i20 1.0570.7240.6570.6571.0571.0001.0001.000
i21 0.8140.7750.4480.4480.8141.000
i22 0.8000.7370.7410.7370.8001.000
i23 0.5730.5550.7890.5550.7891.000
i24 0.8290.6791.0590.6791.0591.0001.0001.000
i25 0.7610.9560.6870.6870.9561.0001.000
i26 0.8700.6730.8220.6730.8701.000
i27 0.3040.9000.9200.3040.9201.0001.000
i28 0.5440.6590.4950.4950.6591.000
i29 0.7550.5890.5200.5200.7551.000
i30 0.4920.6170.6810.4920.6811.000
i31 0.6570.7670.6340.6340.7671.000
i32 0.7520.6840.5960.5960.7521.000
i33 0.6160.8460.8030.6160.8461.000
i34 0.6771.0981.1320.6771.1321.0001.0001.000
i35 0.4541.0371.2040.4541.2041.0001.0001.000
i36 0.8150.6050.8900.6050.8901.000
i37 0.8140.5870.9390.5870.9391.0001.000
i38 1.0190.6750.6130.6131.0191.0001.0001.000
i39 0.5470.9460.8430.5470.9461.0001.000
i40 0.7240.5710.7570.5710.7571.000
i41 0.6110.9160.8910.6110.9161.0001.000
i42 0.6800.6241.1110.6241.1111.0001.0001.000
i43 1.0150.8700.8230.8231.0151.0001.0001.000
i44 0.5870.8660.6910.5870.8661.000
i45 0.7891.0900.6490.6491.0901.0001.0001.000
i46 1.4360.7470.8050.7471.4361.0001.0001.000
i47 0.7910.8850.7230.7230.8851.000
i48 0.7181.0280.8690.7181.0281.0001.0001.000
i49 0.8760.7720.9180.7720.9181.0001.000
i50 0.4980.8820.5990.4980.8821.000
we see that we can remove a significant number of records and big-M constraints. The model solves instantaneous and shows:
---- 74 VARIABLE w.L weights
j1 0.007, j2 0.792, j3 0.202
---- 74 VARIABLE f.L final scores
i1 0.870, i2 0.643, i3 0.917, i4 0.670, i5 0.798, i6 0.538, i7 0.942, i8 0.933
i9 0.760, i10 0.967, i11 0.695, i12 0.771, i13 0.715, i14 0.930, i15 1.025, i16 1.158
i17 1.112, i18 0.541, i19 0.650, i20 0.713, i21 0.710, i22 0.738, i23 0.602, i24 0.757
i25 0.900, i26 0.705, i27 0.900, i28 0.625, i29 0.576, i30 0.629, i31 0.739, i32 0.667
i33 0.836, i34 1.102, i35 1.067, i36 0.664, i37 0.660, i38 0.665, i39 0.923, i40 0.609
i41 0.909, i42 0.722, i43 0.861, i44 0.829, i45 0.999, i46 0.764, i47 0.851, i48 0.994
i49 0.802, i50 0.822
---- 74 VARIABLE delta.L selected
i3 1.000, i7 1.000, i8 1.000, i10 1.000, i14 1.000, i15 1.000, i16 1.000, i17 1.000
i25 1.000, i27 1.000, i34 1.000, i35 1.000, i39 1.000, i41 1.000, i45 1.000, i48 1.000
---- 74 VARIABLE z.L = 16.000 objective
Conclusion
A simple model becomes not that simple once we start "optimizing" it. Unfortunately this is typical for large MIP models.
References
- Simulation/Optimization Package in R for tuning weights to achieve maximum allocation for groups, https://stackoverflow.com/questions/50843023/simulation-optimization-package-in-r-for-tuning-weights-to-achieve-maximum-alloc