Submatrix with Largest Sum

Problem

Given a data matrix $\color{darkblue}a_{i,j}$, find a submatrix such that the sum of the elements is maximized. The term "submatrix" can indicate a contiguous or a non-contiguous subset of rows and columns.

This is a generalization of the 1d maximum subarray problem [1].

Looks easy, but let's make sure it is.

Example data

Of course, the problem is trivial if all elements are non-negative: just select the whole matrix. It is only interesting if there are negative elements. So here is my random matrix:

----      8 PARAMETER a  matrix (data)

             c1          c2          c3          c4          c5          c6          c7          c8          c9         c10

r1       -6.5656.8651.008      -3.977      -4.156      -5.519      -3.0037.125      -8.6580.004
r2        9.9621.5759.8235.245      -7.3862.794      -6.810      -4.9983.379      -1.293
r3       -2.806      -2.971      -7.370      -6.9981.7826.618      -5.3843.3155.517      -3.927
r4       -7.7900.048      -6.7977.449      -4.698      -4.2841.8794.4542.565      -0.724
r5       -1.734      -7.646      -3.716      -9.069      -3.229      -6.3582.9151.2155.399      -4.044
r6        3.2225.1162.549      -4.323      -8.272      -7.9502.8250.906      -9.3705.847
r7       -8.545      -6.4870.5135.004      -6.438      -9.3171.7032.425      -2.213      -2.826
r8       -5.139      -5.072      -7.3908.669      -2.4015.668      -3.999      -7.4904.977      -8.615
r9       -5.960      -9.899      -4.608      -0.003      -6.974      -6.517      -3.387      -3.662      -3.5589.280
r10       9.872      -2.602      -2.5425.440      -2.0668.262      -7.6084.710      -8.8921.526
r11      -8.972      -9.880      -1.9750.3982.578      -5.485      -2.078      -4.480      -6.9538.726
r12      -1.547      -7.307      -2.279      -2.507      -4.6308.967      -6.221      -4.050      -8.509      -1.973
r13      -7.966      -2.322      -3.518      -6.157      -7.7531.9310.229      -9.0995.6628.915
r14       1.9292.147      -2.7501.8813.5970.132      -6.8153.1380.478      -7.512
r15       9.734      -5.4383.5135.5368.649      -5.975      -4.057      -6.055      -5.0732.930
r16       4.699      -8.291      -6.993      -1.316      -6.2613.8545.259      -6.904      -2.2123.909
r17       6.9162.2549.519      -9.462      -6.251      -8.2580.808      -7.4634.680      -7.735
r18      -0.2335.912      -0.1590.671      -9.7880.877      -0.9779.507      -6.323      -6.729
r19      -9.507      -6.444      -8.774      -9.6676.7132.033      -9.460      -6.0789.014      -3.289
r20       1.885      -4.8162.813      -6.895      -0.800      -2.1336.1090.820      -2.1861.156
r21       8.655      -3.025      -9.8348.9771.438      -3.3279.6755.329      -7.7989.896
r22       1.606      -6.6722.867      -3.1148.2478.001      -9.675      -2.6273.2881.868
r23      -9.3096.8368.6420.159      -4.008      -0.068      -9.1015.4740.6594.935
r24       4.4012.632      -7.7029.4234.1359.7257.0962.4294.0264.018
r25       5.8142.204      -8.914      -0.296      -8.9493.972      -6.104      -5.4796.2739.835
r26       5.0134.367      -9.988      -4.7236.4766.3917.208      -5.746      -0.864      -9.233
r27      -3.540      -1.202      -3.693      -7.3056.219      -1.664      -7.164      -0.689      -4.3407.914
r28      -8.712      -1.708      -3.168      -0.6342.8532.872      -3.248      -7.9848.117      -5.653
r29       8.377      -0.965      -8.201      -2.516      -1.700      -1.916      -7.7675.0236.068      -9.527
r30      -0.382      -4.4288.032      -9.6483.6219.0188.0047.9767.489      -2.180

The values are drawn from a uniform distribution, $\color{darkblue}a_{i,j} \sim U(-10,10)$.

Non-contiguous submatrix

The non-convex MIQP formulation is easy. Just introduce binary variables that indicate if a row or column is selected. A cell is selected if both its corresponding row and column are selected.

MIQP model A
\[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_i \cdot \color{darkred}y_j\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\} \end{align}\]

If we pass this on to Cplex, it solves it very fast. The results are:

----     28 VARIABLE z.L                   =      163.081  objective

----     28 VARIABLE x.L  selected rows

r2  1.000,    r3  1.000,    r8  1.000,    r10 1.000,    r14 1.000,    r15 1.000,    r21 1.000,    r22 1.000,    r24 1.000
r25 1.000,    r26 1.000,    r28 1.000,    r29 1.000,    r30 1.000


----     28 VARIABLE y.L  selected columns

c1 1.000,    c4 1.000,    c5 1.000,    c6 1.000,    c9 1.000


----     32 PARAMETER sel  selected cells

             c1          c4          c5          c6          c9

r2        9.9625.245      -7.3862.7943.379
r3       -2.806      -6.9981.7826.6185.517
r8       -5.1398.669      -2.4015.6684.977
r10       9.8725.440      -2.0668.262      -8.892
r14       1.9291.8813.5970.1320.478
r15       9.7345.5368.649      -5.975      -5.073
r21       8.6558.9771.438      -3.327      -7.798
r22       1.606      -3.1148.2478.0013.288
r24       4.4019.4234.1359.7254.026
r25       5.814      -0.296      -8.9493.9726.273
r26       5.013      -4.7236.4766.391      -0.864
r28      -8.712      -0.6342.8532.8728.117
r29       8.377      -2.516      -1.700      -1.9166.068
r30      -0.382      -9.6483.6219.0187.489

This is a non-convex problem, but Cplex solves it very fast: just 982 simplex iterations and 0 branch-and-bound nodes. The reason is: Cplex will linearize this problem. Of course, we can also do this linearization ourselves:

MIP model B
\[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}\]

MIP model B

\[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}\]

Notes:

The problem is also known as the finding the maximum edge biclique in bipartite graphs.
This model solves in 577 Simplex iterations and 0 nodes. This is even a bit faster than the automatically reformulated version. Sometimes we can beat Cplex!
The variables $\color{darkred}s_{i,j}$ can be declared to be binary, or they can be relaxed to be continuous between 0 and 1.
If $\color{darkred}s_{i,j}$ is relaxed, Cplex will make them binary again. We can see this in the log: "Reduced MIP has 40 binaries, 0 generals, 0 SOSs, and 0 indicators" is followed by "Reduced MIP has 340 binaries, 0 generals, 0 SOSs, and 0 indicators".
In many models, we can drop either the $\le$ constraints or the $\ge$ constraint of the linearization. Here we need them both.
Well, that is not completely true. We can drop $\color{darkred}s_{i,j} \le \color{darkred}x_i$ and $\color{darkred}s_{i,j} \le \color{darkred}y_i$ if $\color{darkblue}a_{i,j}\le 0$. And we can drop $\color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1$ if $\color{darkblue}a_{i,j}\ge 0$. Indeed, we can drop all constraints if $\color{darkblue}a_{i,j}= 0$. If we apply this to our model, we reduce the number of constraints from 900 to 433. Interestingly the number of Simplex iterations goes up to 862 (from 577). The number of nodes stays 0.

MIP model C
\[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j\| \color{darkblue}a_{i,j}\gt 0 \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j\| \color{darkblue}a_{i,j}\gt 0 & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j\| \color{darkblue}a_{i,j}\lt 0\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}\]

MIP model C

\[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}s_{i,j}\\ & \color{darkred}s_{i,j} \le \color{darkred}x_i && \forall i,j| \color{darkblue}a_{i,j}\gt 0 \\ &\color{darkred}s_{i,j} \le \color{darkred}y_j && \forall i,j| \color{darkblue}a_{i,j}\gt 0 & \\ & \color{darkred}s_{i,j} \ge \color{darkred}x_i+\color{darkred}y_j-1 && \forall i,j| \color{darkblue}a_{i,j}\lt 0\\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}s_{i,j} \in [0,1] \end{align}\]

A different linearization is proposed in [4]. First row sums are calculated: \[\color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j\] Then the objective \[\max \> \sum_i \color{darkred}r_i \cdot \color{darkred} x_i \] is linearized. This is done by observing that only positive row sums $\color{darkred}r_i$ are ever selected. So the objective is rewritten as \[\max \> \sum_i \color{darkred}r_i^+\] where $\color{darkred}r_i^+ = \max(0,\color{darkred}r_i)$. Linearization of this expression leads to the model:

MIP model D
\[\begin{align} \max& \sum_{i} \color{darkred} r_i^+ \\ & \color{darkred} r_i^+ \le \color{darkred}x_i \cdot \color{darkblue}M_{1,i} \\ & \color{darkred} r_i^+ \le \color{darkred} r_i + (1-\color{darkred}x_i)\cdot \color{darkblue}M_{2,i}\\ & \color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j \\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}r_i^+ \ge 0\\ & \color{darkred}r_i \>\textrm{free} \end{align}\]

MIP model D

\[\begin{align} \max& \sum_{i} \color{darkred} r_i^+ \\ & \color{darkred} r_i^+ \le \color{darkred}x_i \cdot \color{darkblue}M_{1,i} \\ & \color{darkred} r_i^+ \le \color{darkred} r_i + (1-\color{darkred}x_i)\cdot \color{darkblue}M_{2,i}\\ & \color{darkred}r_i = \sum_j \color{darkblue}a_{i,j} \cdot \color{darkred}y_j \\ &\color{darkred}x_i, \color{darkred}y_j \in \{0,1\}\\ & \color{darkred}r_i^+ \ge 0\\ & \color{darkred}r_i \>\textrm{free} \end{align}\]

The first two constraints implement the implications: \[\begin{align} & \color{darkred}x_i=0 \Rightarrow \color{darkred} r_i^+ = 0 \\ & \color{darkred}x_i=1 \Rightarrow \color{darkred} r_i^+ = \color{darkred} r_i\end{align}\] Good values for the big-M constants are given in [4]: \[\begin{align} &\color{darkblue}M_{1,i} = \max(0,\color{darkblue}a_{i,j}) \\ &\color{darkblue}M_{2,i} = -\min(0,\color{darkblue}a_{i,j})\end{align}\] The advantage of this model is that we need only vectors of decision variables. There is no variable matrix (in the other models we have $s_{i,j}$, either directly or indirectly). A disadvantage is that the model is a bit more cplex and seems somewhat slower to solve.

Contiguous submatrix

If we want to select an optimal contiguous submatrix, we can use a modeling trick. This is borrowed from machine scheduling where we sometimes want to limit the number of start-ups. An example can be that manufacturers require such limitations to reduce wear and tear on equipment like generators. For both $ \color{darkred}x_i, \color{darkred}y_j$ we require that they can go from 0 to 1 just once. For this, we introduce binary variables: \[\begin{aligned} &\color{darkred}p_i = \begin{cases} 1 & \text{if row $i$ is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases} \\ & \color{darkred}q_j = \begin{cases} 1 & \text{if column $j$ is the start of the selected submatrix}\\ 0 & \text{otherwise}\end{cases}\end{aligned}\] The implications $\color{darkred}x_{i-1}=0\textbf{ and }\color{darkred}x_{i}=1 \Rightarrow \color{darkred}p_i = 1$ can be implemented as $ \color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1}$. We can only have one $\color{darkred}p_i =1$. Similar for other direction: $\color{darkred}q_j$. So the model can look like:

MIQP model E
\[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_i \cdot \color{darkred}y_j\\ & \color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1} \\ & \color{darkred}q_j \ge \color{darkred}y_{j} - \color{darkred}y_{i-1} \\ & \sum_i \color{darkred}p_i \le 1 \\ & \sum_j \color{darkred}q_j \le 1\\ &\color{darkred}x_i, \color{darkred}y_j,\color{darkred}p_i, \color{darkred}q_j \in \{0,1\} \end{align}\]

MIQP model E

\[\begin{align} \max& \sum_{i,j} \color{darkblue}a_{i,j} \cdot \color{darkred}x_i \cdot \color{darkred}y_j\\ & \color{darkred}p_i \ge \color{darkred}x_{i} - \color{darkred}x_{i-1} \\ & \color{darkred}q_j \ge \color{darkred}y_{j} - \color{darkred}y_{i-1} \\ & \sum_i \color{darkred}p_i \le 1 \\ & \sum_j \color{darkred}q_j \le 1\\ &\color{darkred}x_i, \color{darkred}y_j,\color{darkred}p_i, \color{darkred}q_j \in \{0,1\} \end{align}\]

Here we assume $\color{darkred}x_0 = \color{darkred}y_0= 0$. The results are:

----     38 VARIABLE z.L                   =       81.721  objective

----     38 VARIABLE x.L  selected rows

r20 1.000,    r21 1.000,    r22 1.000,    r23 1.000,    r24 1.000,    r25 1.000,    r26 1.000,    r27 1.000
r28 1.000,    r29 1.000,    r30 1.000


----     38 VARIABLE p.L  start of submatrix

r20 1.000


----     38 VARIABLE y.L  selected columns

c5  1.000,    c6  1.000,    c7  1.000,    c8  1.000,    c9  1.000,    c10 1.000


----     38 VARIABLE q.L  start of submatrix

c5 1.000


----     42 PARAMETER sel  selected cells

             c5          c6          c7          c8          c9         c10

r20      -0.800      -2.1336.1090.820      -2.1861.156
r21       1.438      -3.3279.6755.329      -7.7989.896
r22       8.2478.001      -9.675      -2.6273.2881.868
r23      -4.008      -0.068      -9.1015.4740.6594.935
r24       4.1359.7257.0962.4294.0264.018
r25      -8.9493.972      -6.104      -5.4796.2739.835
r26       6.4766.3917.208      -5.746      -0.864      -9.233
r27       6.219      -1.664      -7.164      -0.689      -4.3407.914
r28       2.8532.872      -3.248      -7.9848.117      -5.653
r29      -1.700      -1.916      -7.7675.0236.068      -9.527
r30       3.6219.0188.0047.9767.489      -2.180

Notes:

This model is still solved completely during preprocessing: zero nodes were needed.
We can relax $\color{darkblue}p$ and $\color{darkblue}q$ to be continuous between 0 and 1.
Making $\sum_i \color{darkred}p_i\le 1$ an equality $\sum_i \color{darkred}p_i= 1$ does not really do much: we can still have zero selected rows, due to how the bounding works.
For small data sets, it is easy to enumerate all possible submatrices. For our data set, we have 25,575 of them (excluding the special case of no selected rows and columns).
We can use the same linearization techniques that were discussed before.

These models seem to solve very quickly. However, if you think we can solve big problems with these formulations, you are wrong. Even slightly larger problems show much larger solution times.

Comparison

Here we solve some larger problems. Our models are:

Non-contiguous MIQP model A. Let Cplex linearize.
Non-contiguous MIP model B. Linearized while keeping all constraints.
Non-contiguous MIP model C. Linearize with dropping unnecessary constraints.
Non-contiguous MIP model D. Alternative formulation from [4].
Contiguous MIQP model E. Let Cplex linearize.

	Model A	Model B	Model C	Model D	Model E	Model E
Problem	non-contiguous	non-contiguous	non-contiguous	non-contiguous	contiguous	contiguous
Problem size (rows/columns)	50/30	50/30	50/30	50/30	50/30	100/100
Constraints/variables (generated)	0/80	4,500/1,580	2,237/1,580	150/180	82/160	202/400
Constraints/variables (after presolve)	2,237/1,580	4,500/1,580	2,237/1,580	100/130	2,317/1,658	15,198/10,398
Objective	784.904	784.904	784.904	784.904	298.447	826.110
Gap	optimal	optimal	optimal	32%	optimal	optimal
Time (seconds)	1,113	1,072	376	> 3600	3	951
Nodes	199,525	77,967	5,273	3,034,448	0	8,174
Simplex iterations	32,497,491	17,956,170	5,569,196	104,396,873	2,519	3,654,174

Notes:

The contiguous problem, as expected, is much easier to solve.
As always: these MIP models find good and even optimal solutions relatively quickly. Most of the time is spent proving optimality.
For the contiguous problem, there are well-known dynamic programming algorithms [2,3]. Not sure if there is one for the non-contiguous case.
For more on the non-contiguous case, see [4].
The above timings are on a slow laptop, so you can expect better performance on a more beefy machine.
Model D performs much better on the data sets shown in [4]. My random data makes the problem very difficult: combinatorially challenging. Practical data sets may show natural "hills" and "valleys", which can make the problem much easier.
It is quite possible that formulation D works well when the problem is very large and easy. It does not need a matrix of decision variables, which may be beneficial in that case.

Conclusions

Using random data, we can easily generate some challenging data sets.

Solvers like Cplex and Gurobi may automatically reformulate non-convex MIQP models into linear models. I like that: we can use quadratic forms to simplify models and not worry about tedious reformulations that move us away from more natural forms of expressions of the problem. In a sense, quadratic terms look like indicator constraints. The underlying solver engine may not even know how to deal with this, but automatically applied reformulations make it look it does. In theory, solvers should be in a better position to apply the most optimal reformulations for a model than a modeler.

In a subsequent post, I will discuss a generalization of this problem. Instead of matrix elements, we consider points on the 2d plane, each having a value. Find the rectangle such that the sum of the values of the points inside the rectangle is maximized.

References

Maximum subarray problem, https://en.wikipedia.org/wiki/Maximum_subarray_problem
Maximum sum rectangle in a 2d matrix, https://www.geeksforgeeks.org/maximum-sum-rectangle-in-a-2d-matrix-dp-27/
Maximum Sum Rectangle In A 2D Matrix - Kadane's Algorithm Applications (Dynamic Programming), https://www.youtube.com/watch?v=-FgseNO-6Gk
Vincent Branders, Pierre Schaus, and Pierre Dupont, Mining a Sub-Matrix of Maximal Sum, https://arxiv.org/abs/1709.08461

Submatrix with Largest Sum

Problem

Example data

Non-contiguous submatrix

Contiguous submatrix

Comparison

Conclusions

References

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...