Problem
Given a data matrix \(\color{darkblue}a_{i,j}\), find a submatrix such that the sum of the elements is maximized. The term "submatrix" can indicate a contiguous or a non-contiguous subset of rows and columns.
This is a generalization of the 1d maximum subarray problem [1].
Looks easy, but let's make sure it is.
Example data
Of course, the problem is trivial if all elements are non-negative: just select the whole matrix. It is only interesting if there are negative elements. So here is my random matrix:
---- 8 PARAMETER a matrix (data)
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
r1 -6.5656.8651.008 -3.977 -4.156 -5.519 -3.0037.125 -8.6580.004
r2 9.9621.5759.8235.245 -7.3862.794 -6.810 -4.9983.379 -1.293
r3 -2.806 -2.971 -7.370 -6.9981.7826.618 -5.3843.3155.517 -3.927
r4 -7.7900.048 -6.7977.449 -4.698 -4.2841.8794.4542.565 -0.724
r5 -1.734 -7.646 -3.716 -9.069 -3.229 -6.3582.9151.2155.399 -4.044
r6 3.2225.1162.549 -4.323 -8.272 -7.9502.8250.906 -9.3705.847
r7 -8.545 -6.4870.5135.004 -6.438 -9.3171.7032.425 -2.213 -2.826
r8 -5.139 -5.072 -7.3908.669 -2.4015.668 -3.999 -7.4904.977 -8.615
r9 -5.960 -9.899 -4.608 -0.003 -6.974 -6.517 -3.387 -3.662 -3.5589.280
r10 9.872 -2.602 -2.5425.440 -2.0668.262 -7.6084.710 -8.8921.526
r11 -8.972 -9.880 -1.9750.3982.578 -5.485 -2.078 -4.480 -6.9538.726
r12 -1.547 -7.307 -2.279 -2.507 -4.6308.967 -6.221 -4.050 -8.509 -1.973
r13 -7.966 -2.322 -3.518 -6.157 -7.7531.9310.229 -9.0995.6628.915
r14 1.9292.147 -2.7501.8813.5970.132 -6.8153.1380.478 -7.512
r15 9.734 -5.4383.5135.5368.649 -5.975 -4.057 -6.055 -5.0732.930
r16 4.699 -8.291 -6.993 -1.316 -6.2613.8545.259 -6.904 -2.2123.909
r17 6.9162.2549.519 -9.462 -6.251 -8.2580.808 -7.4634.680 -7.735
r18 -0.2335.912 -0.1590.671 -9.7880.877 -0.9779.507 -6.323 -6.729
r19 -9.507 -6.444 -8.774 -9.6676.7132.033 -9.460 -6.0789.014 -3.289
r20 1.885 -4.8162.813 -6.895 -0.800 -2.1336.1090.820 -2.1861.156
r21 8.655 -3.025 -9.8348.9771.438 -3.3279.6755.329 -7.7989.896
r22 1.606 -6.6722.867 -3.1148.2478.001 -9.675 -2.6273.2881.868
r23 -9.3096.8368.6420.159 -4.008 -0.068 -9.1015.4740.6594.935
r24 4.4012.632 -7.7029.4234.1359.7257.0962.4294.0264.018
r25 5.8142.204 -8.914 -0.296 -8.9493.972 -6.104 -5.4796.2739.835
r26 5.0134.367 -9.988 -4.7236.4766.3917.208 -5.746 -0.864 -9.233
r27 -3.540 -1.202 -3.693 -7.3056.219 -1.664 -7.164 -0.689 -4.3407.914
r28 -8.712 -1.708 -3.168 -0.6342.8532.872 -3.248 -7.9848.117 -5.653
r29 8.377 -0.965 -8.201 -2.516 -1.700 -1.916 -7.7675.0236.068 -9.527
r30 -0.382 -4.4288.032 -9.6483.6219.0188.0047.9767.489 -2.180
The values are drawn from a uniform distribution, \(\color{darkblue}a_{i,j} \sim U(-10,10)\).
Non-contiguous submatrix
The non-convex MIQP formulation is easy. Just introduce binary variables that indicate if a row or column is selected. A cell is selected if both its corresponding row and column are selected.
| MIQP model A |
|---|
| \[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_i \cdot \color{darkred}y_j\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\} \end{align}\] |
---- 28 VARIABLE z.L = 163.081 objective
---- 28 VARIABLE x.L selected rows
r2 1.000, r3 1.000, r8 1.000, r10 1.000, r14 1.000, r15 1.000, r21 1.000, r22 1.000, r24 1.000
r25 1.000, r26 1.000, r28 1.000, r29 1.000, r30 1.000
---- 28 VARIABLE y.L selected columns
c1 1.000, c4 1.000, c5 1.000, c6 1.000, c9 1.000
---- 32 PARAMETER sel selected cells
c1 c4 c5 c6 c9
r2 9.9625.245 -7.3862.7943.379
r3 -2.806 -6.9981.7826.6185.517
r8 -5.1398.669 -2.4015.6684.977
r10 9.8725.440 -2.0668.262 -8.892
r14 1.9291.8813.5970.1320.478
r15 9.7345.5368.649 -5.975 -5.073
r21 8.6558.9771.438 -3.327 -7.798
r22 1.606 -3.1148.2478.0013.288
r24 4.4019.4234.1359.7254.026
r25 5.814 -0.296 -8.9493.9726.273
r26 5.013 -4.7236.4766.391 -0.864
r28 -8.712 -0.6342.8532.8728.117
r29 8.377 -2.516 -1.700 -1.9166.068
r30 -0.382 -9.6483.6219.0187.489
This is a non-convex problem, but Cplex solves it very fast: just 982 simplex iterations and 0 branch-and-bound nodes. The reason is: Cplex will linearize this problem. Of course, we can also do this linearization ourselves:
| MIP model B |
|---|
| \[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}\] |
Notes:
- The problem is also known as the finding the maximum edge biclique in bipartite graphs.
- This model solves in 577 Simplex iterations and 0 nodes. This is even a bit faster than the automatically reformulated version. Sometimes we can beat Cplex!
- The variables \(\color{darkred}s_{i,j}\) can be declared to be binary, or they can be relaxed to be continuous between 0 and 1.
- If \(\color{darkred}s_{i,j}\) is relaxed, Cplex will make them binary again. We can see this in the log: "Reduced MIP has 40 binaries, 0 generals, 0 SOSs, and 0 indicators" is followed by "Reduced MIP has 340 binaries, 0 generals, 0 SOSs, and 0 indicators".
- In many models, we can drop either the \(\le\) constraints or the \(\ge\) constraint of the linearization. Here we need them both.
- Well, that is not completely true. We can drop \(\color{darkred}s_{i,j} \le \color{darkred}x_i\) and \(\color{darkred}s_{i,j} \le \color{darkred}y_i\) if \(\color{darkblue}a_{i,j}\le 0\). And we can drop \(\color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1\) if \(\color{darkblue}a_{i,j}\ge 0\). Indeed, we can drop all constraints if \(\color{darkblue}a_{i,j}= 0\). If we apply this to our model, we reduce the number of constraints from 900 to 433. Interestingly the number of Simplex iterations goes up to 862 (from 577). The number of nodes stays 0.
| MIP model C |
|---|
| \[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j| \color{darkblue}a_{i,j}\gt 0 \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j| \color{darkblue}a_{i,j}\gt 0 & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j| \color{darkblue}a_{i,j}\lt 0\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}\] |
A different linearization is proposed in [4]. First row sums are calculated: \[\color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j\] Then the objective \[\max \> \sum_i \color{darkred}r_i \cdot \color{darkred} x_i \] is linearized. This is done by observing that only positive row sums \(\color{darkred}r_i\) are ever selected. So the objective is rewritten as \[\max \> \sum_i \color{darkred}r_i^+\] where \(\color{darkred}r_i^+ = \max(0,\color{darkred}r_i)\). Linearization of this expression leads to the model:
| MIP model D |
|---|
| \[\begin{align} \max& \sum_{i} \color{darkred} r_i^+ \\ & \color{darkred} r_i^+ \le \color{darkred}x_i \cdot \color{darkblue}M_{1,i} \\ & \color{darkred} r_i^+ \le \color{darkred} r_i + (1-\color{darkred}x_i)\cdot \color{darkblue}M_{2,i}\\ & \color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j \\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}r_i^+ \ge 0\\ & \color{darkred}r_i \>\textrm{free} \end{align}\] |
The first two constraints implement the implications: \[\begin{align} & \color{darkred}x_i=0 \Rightarrow \color{darkred} r_i^+ = 0 \\ & \color{darkred}x_i=1 \Rightarrow \color{darkred} r_i^+ = \color{darkred} r_i\end{align}\] Good values for the big-M constants are given in [4]: \[\begin{align} &\color{darkblue}M_{1,i} = \max(0,\color{darkblue}a_{i,j}) \\ &\color{darkblue}M_{2,i} = -\min(0,\color{darkblue}a_{i,j})\end{align}\] The advantage of this model is that we need only vectors of decision variables. There is no variable matrix (in the other models we have \(s_{i,j}\), either directly or indirectly). A disadvantage is that the model is a bit more cplex and seems somewhat slower to solve.
Contiguous submatrix
If we want to select an optimal contiguous submatrix, we can use a modeling trick. This is borrowed from machine scheduling where we sometimes want to limit the number of start-ups. An example can be that manufacturers require such limitations to reduce wear and tear on equipment like generators. For both \( \color{darkred}x_i, \color{darkred}y_j\) we require that they can go from 0 to 1 just once. For this, we introduce binary variables: \[\begin{aligned} &\color{darkred}p_i = \begin{cases} 1 & \text{if row $i$ is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases} \\ & \color{darkred}q_j = \begin{cases} 1 & \text{if column $j$ is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases}\end{aligned}\] The implications \(\color{darkred}x_{i-1}=0\textbf{ and }\color{darkred}x_{i}=1 \Rightarrow \color{darkred}p_i = 1\) can be implemented as \( \color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1}\). We can only have one \(\color{darkred}p_i =1\). Similar for other direction: \(\color{darkred}q_j\). So the model can look like:
| MIQP model E |
|---|
| \[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_i \cdot \color{darkred}y_j\\ & \color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1} \\ & \color{darkred}q_j \ge \color{darkred}y_{j} - \color{darkred}y_{i-1} \\ & \sum_i \color{darkred}p_i \le 1 \\ & \sum_j \color{darkred}q_j \le 1\\ &\color{darkred}x_i, \color{darkred}y_j,\color{darkred}p_i, \color{darkred}q_j \in \{0,1\} \end{align}\] |
---- 38 VARIABLE z.L = 81.721 objective
---- 38 VARIABLE x.L selected rows
r20 1.000, r21 1.000, r22 1.000, r23 1.000, r24 1.000, r25 1.000, r26 1.000, r27 1.000
r28 1.000, r29 1.000, r30 1.000
---- 38 VARIABLE p.L start of submatrix
r20 1.000
---- 38 VARIABLE y.L selected columns
c5 1.000, c6 1.000, c7 1.000, c8 1.000, c9 1.000, c10 1.000
---- 38 VARIABLE q.L start of submatrix
c5 1.000
---- 42 PARAMETER sel selected cells
c5 c6 c7 c8 c9 c10
r20 -0.800 -2.1336.1090.820 -2.1861.156
r21 1.438 -3.3279.6755.329 -7.7989.896
r22 8.2478.001 -9.675 -2.6273.2881.868
r23 -4.008 -0.068 -9.1015.4740.6594.935
r24 4.1359.7257.0962.4294.0264.018
r25 -8.9493.972 -6.104 -5.4796.2739.835
r26 6.4766.3917.208 -5.746 -0.864 -9.233
r27 6.219 -1.664 -7.164 -0.689 -4.3407.914
r28 2.8532.872 -3.248 -7.9848.117 -5.653
r29 -1.700 -1.916 -7.7675.0236.068 -9.527
r30 3.6219.0188.0047.9767.489 -2.180
Notes:
- This model is still solved completely during preprocessing: zero nodes were needed.
- We can relax \(\color{darkblue}p\) and \(\color{darkblue}q\) to be continuous between 0 and 1.
- Making \(\sum_i \color{darkred}p_i\le 1\) an equality \(\sum_i \color{darkred}p_i= 1\) does not really do much: we can still have zero selected rows, due to how the bounding works.
- For small data sets, it is easy to enumerate all possible submatrices. For our data set, we have 25,575 of them (excluding the special case of no selected rows and columns).
- We can use the same linearization techniques that were discussed before.
Comparison
Here we solve some larger problems. Our models are:
- Non-contiguous MIQP model A. Let Cplex linearize.
- Non-contiguous MIP model B. Linearized while keeping all constraints.
- Non-contiguous MIP model C. Linearize with dropping unnecessary constraints.
- Non-contiguous MIP model D. Alternative formulation from [4].
- Contiguous MIQP model E. Let Cplex linearize.
| Model A | Model B | Model C | Model D | Model E | Model E | |
|---|---|---|---|---|---|---|
| Problem | non-contiguous | non-contiguous | non-contiguous | non-contiguous | contiguous | contiguous |
| Problem size (rows/columns) | 50/30 | 50/30 | 50/30 | 50/30 | 50/30 | 100/100 |
| Constraints/variables (generated) | 0/80 | 4,500/1,580 | 2,237/1,580 | 150/180 | 82/160 | 202/400 |
| Constraints/variables (after presolve) | 2,237/1,580 | 4,500/1,580 | 2,237/1,580 | 100/130 | 2,317/1,658 | 15,198/10,398 |
| Objective | 784.904 | 784.904 | 784.904 | 784.904 | 298.447 | 826.110 |
| Gap | optimal | optimal | optimal | 32% | optimal | optimal |
| Time (seconds) | 1,113 | 1,072 | 376 | > 3600 | 3 | 951 |
| Nodes | 199,525 | 77,967 | 5,273 | 3,034,448 | 0 | 8,174 |
| Simplex iterations | 32,497,491 | 17,956,170 | 5,569,196 | 104,396,873 | 2,519 | 3,654,174 |
Notes:
- The contiguous problem, as expected, is much easier to solve.
- As always: these MIP models find good and even optimal solutions relatively quickly. Most of the time is spent proving optimality.
- For the contiguous problem, there are well-known dynamic programming algorithms [2,3]. Not sure if there is one for the non-contiguous case.
- For more on the non-contiguous case, see [4].
- The above timings are on a slow laptop, so you can expect better performance on a more beefy machine.
- Model D performs much better on the data sets shown in [4]. My random data makes the problem very difficult: combinatorially challenging. Practical data sets may show natural "hills" and "valleys", which can make the problem much easier.
- It is quite possible that formulation D works well when the problem is very large and easy. It does not need a matrix of decision variables, which may be beneficial in that case.
Conclusions
Using random data, we can easily generate some challenging data sets.
Solvers like Cplex and Gurobi may automatically reformulate non-convex MIQP models into linear models. I like that: we can use quadratic forms to simplify models and not worry about tedious reformulations that move us away from more natural forms of expressions of the problem. In a sense, quadratic terms look like indicator constraints. The underlying solver engine may not even know how to deal with this, but automatically applied reformulations make it look it does. In theory, solvers should be in a better position to apply the most optimal reformulations for a model than a modeler.
In a subsequent post, I will discuss a generalization of this problem. Instead of matrix elements, we consider points on the 2d plane, each having a value. Find the rectangle such that the sum of the values of the points inside the rectangle is maximized.
References
- Maximum subarray problem, https://en.wikipedia.org/wiki/Maximum_subarray_problem
- Maximum sum rectangle in a 2d matrix, https://www.geeksforgeeks.org/maximum-sum-rectangle-in-a-2d-matrix-dp-27/
- Maximum Sum Rectangle In A 2D Matrix - Kadane's Algorithm Applications (Dynamic Programming), https://www.youtube.com/watch?v=-FgseNO-6Gk
- Vincent Branders, Pierre Schaus, and Pierre Dupont, Mining a Sub-Matrix of Maximal Sum, https://arxiv.org/abs/1709.08461