Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 805

Select n items but optimize for the best k items

$
0
0

Problem Statement


This is a strange one [1]. Consider a standard objective function: \[\max\>\sum_j p_j\cdot x_j\] where \(p\) are coefficients and \(x\) are binary decision variables. This will look at all \(n\) objective coefficients. Now the question is: 

Can I optimize just for the \(k\) best ones?

This means in effect: \[\begin{align} \max & \sum_j \delta_j \cdot p_j\cdot x_j\\ & \sum_j \delta_j = k \\ & \delta_j \in \{0,1\}\end{align}\] We introduced extra binary decision variables \(\delta\). But also: we made the objective non-linear.

The question is: how can we model this while keeping the model linear.


Example problem


Here is a small multi-dimensional knapsack-like problem:

Base MIP Model
\[ \begin{align} \max& \sum_j \color{darkblue}p_j \cdot \color{darkred}x_j \\ & \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}x_j \le \color{darkblue}b_i && \forall i \\ & \sum_j \color{darkred}x_j = \color{darkblue}m \\ & \color{darkred}x_j \in \{0,1\}\end{align}\]

The randomly generated data looks like:


----     16 PARAMETER p  objective coefficients

j1 0.172, j2 0.843, j3 0.550, j4 0.301, j5 0.292, j6 0.224, j7 0.350, j8 0.856
j9 0.067, j10 0.500


---- 16 PARAMETER a matrix coefficients

j1 j2 j3 j4 j5 j6 j7 j8 j9

i1 0.9980.5790.9910.7620.1310.6400.1600.2500.669
i2 0.3600.3510.1310.1500.5890.8310.2310.6660.776
i3 0.1100.5020.1600.8720.2650.2860.5940.7230.628
i4 0.4130.1180.3140.0470.3390.1820.6460.5610.770
i5 0.6610.7560.6270.2840.0860.1030.6410.5450.032

+ j10

i1 0.435
i2 0.304
i3 0.464
i4 0.298
i5 0.792


---- 16 PARAMETER b rhs values

i1 2.500, i2 2.500, i3 2.500, i4 2.500, i5 2.500


---- 16 PARAMETER m = 5.000 number of x to select


To see the "best" \(x_j\), I reformulated the problem to:

Base MIP Model with y variables
\[ \begin{align} \max& \sum_j \color{darkred}y_j \\ & \color{darkred}y_j = \color{darkblue}p_j \cdot \color{darkred}x_j \\ & \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}x_j \le \color{darkblue}b_i && \forall i \\ & \sum_j \color{darkred}x_j = \color{darkblue}m \\ & \color{darkred}x_j \in \{0,1\}\end{align}\]


When we solve this problem we see:


----     41 VARIABLE x.L  binary decision variables

j3 1.000, j5 1.000, j6 1.000, j7 1.000, j8 1.000


---- 41 VARIABLE y.L auxiliary variables (y=p*x)

j3 0.550, j5 0.292, j6 0.224, j7 0.350, j8 0.856


---- 41 VARIABLE z.L = 2.273 objective

---- 61 PARAMETER yordered ordered version of y

k1.j8 0.856
k2.j3 0.550
k3.j7 0.350
k4.j5 0.292
k5.j6 0.224


The model says to select items 3,5,6,7 and 8.

The three "best" items are j8 (0.856),  j3 (0.550), and j7 (0.350). Let's optimize for the best three. We change the model to:


MIQP Model
\[ \begin{align} \max& \sum_j \color{darkred}\delta_j \cdot \color{darkred}y_j \\ & \color{darkred}y_j = \color{darkblue}p_j \cdot \color{darkred}x_j \\ & \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}x_j \le \color{darkblue}b_i && \forall i \\ & \sum_j \color{darkred}x_j = \color{darkblue}m \\ & \sum_j \color{darkred}\delta_j = 3\\ & \color{darkred}x_j \in \{0,1\}\\& \color{darkred}\delta_j \in \{0,1\}\end{align}\]


We can solve this with a non-convex solver (or a solver smart enough to reformulate things automatically). The results look like:



----     75 VARIABLE delta.L  three best y variables

j3 1.000, j8 1.000, j10 1.000


---- 75 VARIABLE x.L binary decision variables

j3 1.000, j5 1.000, j8 1.000, j9 1.000, j10 1.000


---- 75 VARIABLE y.L auxiliary variables (y=p*x)

j3 0.550, j5 0.292, j8 0.856, j9 0.067, j10 0.500


---- 75 VARIABLE z.L = 1.907 objective

---- 89 PARAMETER yordered ordered version of y

k1.j8 0.856
k2.j3 0.550
k3.j10 0.500
k4.j5 0.292
k5.j9 0.067


This has changed the best three. The third item has changed from  j7 (with contribution 0.350) to j10 (0.5). The selected items become: 3,5,8,9, and 10. This is different than before.

There are few linearizations we can consider:

  • Create ordered variables for \(y\) inside the model. Once we know these, just add the first three to the objective. Sorting variables inside a MIP is not completely trivial.
  • Better is to linearize \(\delta_j \cdot y_j = p_j \cdot \delta_j \cdot x_j\) directly. 
  • Even better is to note that \(\delta_j = 1 \Rightarrow x_j=1\) or \(x_j\ge \delta_j\). This is actually the only thing we need. So we end up with:


Linear MIP Model
\[ \begin{align} \max& \sum_j \color{darkblue}p_j \cdot \color{darkred}\delta_j \\ & \color{darkred}x_j \ge \color{darkred}\delta_j \\ & \color{darkred}y_j = \color{darkblue}p_j \cdot \color{darkred}x_j \\ & \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}x_j \le \color{darkblue}b_i && \forall i \\ & \sum_j \color{darkred}x_j = \color{darkblue}m \\ & \sum_j \color{darkred}\delta_j = 3\\ & \color{darkred}x_j \in \{0,1\}\\& \color{darkred}\delta_j \in \{0,1\}\end{align}\]

The solution looks like:



----     52 VARIABLE x.L  binary decision variables

j3 1.000, j5 1.000, j8 1.000, j9 1.000, j10 1.000


---- 52 VARIABLE delta.L three best y variables

j3 1.000, j8 1.000, j10 1.000


---- 52 VARIABLE y.L auxiliary variables (y=p*x)

j3 0.550, j5 0.292, j8 0.856, j9 0.067, j10 0.500


---- 52 VARIABLE z.L = 1.907 objective

---- 72 PARAMETER yordered ordered version of y

k1.j8 0.856
k2.j3 0.550
k3.j10 0.500
k4.j5 0.292
k5.j9 0.067


This is the same solution as obtained by the MIQP model.


Conclusion


It turns out that linearizing the problem of finding the "best three" is rather easy. A little bit surprising to me: I expected to have to do some complicated sorting. Obviously, we exploited here that the decision variables \(x_j\) are binary variables.

References


  1. With R and lpsolve is there a way of optimising for the best 11 elements but still pick 15 elements in total? https://stackoverflow.com/questions/62511975/with-r-and-lpsolve-is-there-a-way-of-optimising-for-the-best-11-elements-but-sti 

Viewing all articles
Browse latest Browse all 805