select weights to maximize count

In [1] a problem that is simple at first sight. Looking a bit further there are some interesting angles.

The example data set is:

----     27 PARAMETER a  

            j1          j2          j3

i1       0.8700.7300.410
i2       0.8200.7300.850
i3       0.8200.3700.850
i4       0.5800.9500.420
i5       1.0001.0000.900

The idea is that we can apply weights $w_j$ to calculate a final score for each row:\[F_i = \sum_j w_j a_{i,j}\] Weights obey the usual constraints: $w_j \in [0,1]$ and $\sum_j w_j=1$. The goal is to find optimal weights such that the number of records with final score in the bucket $[0.9, 1]$ is maximized.

Looking at the data, $w=(0,1,0)$ is a good choice. This gives us two $F_i$'s in our bucket, Let's see if we can formalize this with a model.

MIP Model

Counting is done with binary variables. So let's define\[\delta_i = \begin{cases} 1 & \text{if $L\le F_i\le U$}\\ 0 & \text{otherwise}\end{cases}\] I used $L$ and (\U\) to indicate the bucket,

A first model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_i \delta_i \\ & F_i = \sum_j w_j a_{i,j}\\& \sum_i w_i = 1\\ & L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\\ & \delta_i \in \{0,1\} \\ & w_i \in [0,1]\end{align}}\] Here $M$ is a large enough constant.

The sandwich equation models the implication \[\delta_i=1 \Rightarrow L\le F_i \le U\] The objective will make sure that $L\le F_i \le U \Rightarrow \delta_i=1 $ holds for the optimal solution.

Big-M

The data seems to suggest $0 \le a_{i,j} \le 1$. Which means $0 \le F_i \le 1$. This follows from \[\min_j a_{i,j} \le F_i \le \min_j a_{i,j}\] We can also assume $0 \le L \le U \le 1$. We can conclude the largest difference possible between $F_i$ and $L$ (and $F_i$ and $U$) is one. So, in this case an obvious value for $M$ is $M=1$. More generally, if we first make sure $U \le \max\{a_{i,j}\}$ and $L \ge \min\{a_{i,j}\}$ by the preprocessing step: \[\begin{align} & U := \min\{U, \max\{a_{i,j}\}\}\\ & L := \max\{L, \min\{a_{i,j}\}\}\end{align}\] we have: \[M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\] We can do even better by using an $M_i$ for each record instead of a single, global $M$. We will get back to this later.

More preprocessing

We can optimize things further by observing that not always both "branches" of \[L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\] are needed. With our small example we have $U=1$, but we know already that $F_i \le 1$. So in this case we only need to worry about $L - M(1-\delta_i) \le F_i$.

We can generalize this as follows. First calculate bounds $\ell_i \le F_i\le u_i$: \[\begin{align} & \ell_i = \min_j a_{i,j}\\ & u_i = \max_j a_{i,j}\end{align}\] Then generate constraints: \[\begin{align} & L - M(1-\delta_i) \le F_i && \forall i | L > \ell_i\\ & F_i \le U+M(1-\delta_i) && \forall i | U < u_i\end{align}\]

Even more preprocessing

The first three records in the example data set are really no candidates for having $F_i \in [0.9, 1]$. The reason is that for those records we have $u_i \lt L$. In general we can skip all records with $u_i \lt L$ or $\ell_i \gt U$. These records will never have $\delta_i=1$.

Combining things

I extended the $a$ matrix with following columns:

lo: $\ell_i=\min_j a_{i,j}$.
up: $u_i = \max_j a_{i,j}$.
cand: boolean, indicates if this row is a candidate. We check: $u_i \ge L$ and $\ell_i \le U$.
chkL: boolean, indicates if we need to check the left "branch". This is equal to one if $\ell_i\lt L$.
chkR: boolean, indicates if we need to check the right "branch". This is equal to one if $u_i\gt L$. No records have this equal to 1.

The matrix now looks like:

----     41 PARAMETER a  

            j1          j2          j3          lo          up        cand        chkL

i1       0.8700.7300.4100.4100.8701.000
i2       0.8200.7300.8500.7300.8501.000
i3       0.8200.3700.8500.3700.8501.000
i4       0.5800.9500.4200.4200.9501.0001.000
i5       1.0001.0000.9000.9001.0001.000

For a small data set this all does not make much difference, but for large ones, we make the model much smaller.

Results

The optimal weights for this small data set are not unique. Obviously I get the same optimal number of selected records:

----     71 VARIABLE w.L  weights

j1 0.135,    j2 0.865


----     71 VARIABLE f.L  final scores

i1 0.749,    i2 0.742,    i3 0.431,    i4 0.900,    i5 1.000


----     71 VARIABLE delta.L  selected

i4 1.000,    i5 1.000


----     71 VARIABLE z.L                   =        2.000  objective

Big-M revisited

I did not pay too much attention to my big-M's. I just used the calculation $M = \max \{a_{i,j}\} - \min \{a_{i,j}\}$, which yielded:

----     46 PARAMETER M                    =        0.630  big-M

We can use a tailored $M^L_i, M^U_i$ for each inequality. For the lower bounds our equation looks like \[L - M^L_i(1-\delta_i) \le F_i\] This means we have $M^L_i \ge L - F_i$. We have bounds on $F_i$, (we stored these in $a_{i,up}$ and $a_{i,lo}$). So we can set $M^L_i =L - a_{i,lo}$. This gives:

----     78 PARAMETER ML  

i4 0.480

Similar for the U inequality. For our small example this is not so important, but for larger instances with a wider range of data this may be essential.

Larger problem

For a larger random problem:

----     43 PARAMETER a  data

             j1          j2          j3          lo          up        cand        chkL        chkU

i1        0.7370.8660.8930.7370.8931.000
i2        0.4340.6540.6060.4340.6541.000
i3        0.7210.9870.6480.6480.9871.0001.000
i4        0.8000.6180.8690.6180.8691.000
i5        0.6680.7031.1770.6681.1771.0001.0001.000
i6        0.6560.5400.5250.5250.6561.000
i7        0.8641.0370.5690.5691.0371.0001.0001.000
i8        0.9421.0030.6550.6551.0031.0001.0001.000
i9        0.6000.8010.6010.6000.8011.000
i10       0.6190.9331.1140.6191.1141.0001.0001.000
i11       1.2430.6750.7560.6751.2431.0001.0001.000
i12       0.6080.7780.7470.6080.7781.000
i13       0.6950.5931.1960.5931.1961.0001.0001.000
i14       0.9651.0010.6500.6501.0011.0001.0001.000
i15       0.9511.0400.9690.9511.0401.0001.000
i16       0.5141.2200.9350.5141.2201.0001.0001.000
i17       0.8341.1301.0540.8341.1301.0001.0001.000
i18       1.1610.5500.4810.4811.1611.0001.0001.000
i19       0.6380.6400.6910.6380.6911.000
i20       1.0570.7240.6570.6571.0571.0001.0001.000
i21       0.8140.7750.4480.4480.8141.000
i22       0.8000.7370.7410.7370.8001.000
i23       0.5730.5550.7890.5550.7891.000
i24       0.8290.6791.0590.6791.0591.0001.0001.000
i25       0.7610.9560.6870.6870.9561.0001.000
i26       0.8700.6730.8220.6730.8701.000
i27       0.3040.9000.9200.3040.9201.0001.000
i28       0.5440.6590.4950.4950.6591.000
i29       0.7550.5890.5200.5200.7551.000
i30       0.4920.6170.6810.4920.6811.000
i31       0.6570.7670.6340.6340.7671.000
i32       0.7520.6840.5960.5960.7521.000
i33       0.6160.8460.8030.6160.8461.000
i34       0.6771.0981.1320.6771.1321.0001.0001.000
i35       0.4541.0371.2040.4541.2041.0001.0001.000
i36       0.8150.6050.8900.6050.8901.000
i37       0.8140.5870.9390.5870.9391.0001.000
i38       1.0190.6750.6130.6131.0191.0001.0001.000
i39       0.5470.9460.8430.5470.9461.0001.000
i40       0.7240.5710.7570.5710.7571.000
i41       0.6110.9160.8910.6110.9161.0001.000
i42       0.6800.6241.1110.6241.1111.0001.0001.000
i43       1.0150.8700.8230.8231.0151.0001.0001.000
i44       0.5870.8660.6910.5870.8661.000
i45       0.7891.0900.6490.6491.0901.0001.0001.000
i46       1.4360.7470.8050.7471.4361.0001.0001.000
i47       0.7910.8850.7230.7230.8851.000
i48       0.7181.0280.8690.7181.0281.0001.0001.000
i49       0.8760.7720.9180.7720.9181.0001.000
i50       0.4980.8820.5990.4980.8821.000

we see that we can remove a significant number of records and big-M constraints. The model solves instantaneous and shows:

----     74 VARIABLE w.L  weights

j1 0.007,    j2 0.792,    j3 0.202


----     74 VARIABLE f.L  final scores

i1  0.870,    i2  0.643,    i3  0.917,    i4  0.670,    i5  0.798,    i6  0.538,    i7  0.942,    i8  0.933
i9  0.760,    i10 0.967,    i11 0.695,    i12 0.771,    i13 0.715,    i14 0.930,    i15 1.025,    i16 1.158
i17 1.112,    i18 0.541,    i19 0.650,    i20 0.713,    i21 0.710,    i22 0.738,    i23 0.602,    i24 0.757
i25 0.900,    i26 0.705,    i27 0.900,    i28 0.625,    i29 0.576,    i30 0.629,    i31 0.739,    i32 0.667
i33 0.836,    i34 1.102,    i35 1.067,    i36 0.664,    i37 0.660,    i38 0.665,    i39 0.923,    i40 0.609
i41 0.909,    i42 0.722,    i43 0.861,    i44 0.829,    i45 0.999,    i46 0.764,    i47 0.851,    i48 0.994
i49 0.802,    i50 0.822


----     74 VARIABLE delta.L  selected

i3  1.000,    i7  1.000,    i8  1.000,    i10 1.000,    i14 1.000,    i15 1.000,    i16 1.000,    i17 1.000
i25 1.000,    i27 1.000,    i34 1.000,    i35 1.000,    i39 1.000,    i41 1.000,    i45 1.000,    i48 1.000


----     74 VARIABLE z.L                   =       16.000  objective

Conclusion

A simple model becomes not that simple once we start "optimizing" it. Unfortunately this is typical for large MIP models.

References

Simulation/Optimization Package in R for tuning weights to achieve maximum allocation for groups, https://stackoverflow.com/questions/50843023/simulation-optimization-package-in-r-for-tuning-weights-to-achieve-maximum-alloc

select weights to maximize count

MIP Model

Big-M

More preprocessing

Even more preprocessing

Combining things

Results

Big-M revisited

Larger problem

Conclusion

References

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112