Assignment problem with a wrinkle formulated as a network problem

In a previous post [1] I discussed a simple problem, but not so easy to solve for some larger data sets. Basically, it was an assignment problem with an extra condition. The problem was a follows:

Consider two arrays \(\color{darkblue}a_i\) (length \(\color{darkblue}m\)) and \(\color{darkblue}b_j\) (length \(\color{darkblue}n\)) with \(\color{darkblue}m \lt \color{darkblue}n\). Assign all values \(\color{darkblue}a_i\) to a \(\color{darkblue}b_j\) such that:

Each \(\color{darkblue}b_j\) can have 0 or 1 \(\color{darkblue}a_i\) assigned to it.
The assignments need to maintain the original order of \(\color{darkblue}a_i\). I.e. if \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\) then \(\color{darkblue}a_{i+1}\) must be assigned to a slot in \(\color{darkblue}b\) that is beyond slot \(j\). In the picture below that means that arrows cannot cross.
Do this while minimizing the sum of the products.

In [1], I attacked this as a mixed-integer programming problem. In this post, I want to see if we can solve this as a network problem. This was largely inspired by the comments in [1].

Graph

We start with the nodes. We denote the nodes by \(\color{darkblue}n_{i,j}\) representing: \(\color{darkblue}a_i\) is assigned to \(\color{darkblue}b_j\). Not all assignments are possible. For instance, we cannot assign (\color{darkblue}a_2\) to \(\color{darkblue}b_1\). That means: node \(\color{darkblue}n_{2,1}\) does not exist.

We also need a source node and a sink node.

The arcs indicate: after assigning \(\color{darkblue}a_i \rightarrow \color{darkblue}b_j\) we need to assign the next item \(\color{darkblue}a_{i+1}\rightarrow \color{darkblue}b_{j+k}\) for some \(k\ge 1\). In addition we need to connect the source node to all nodes with \(i=1\) and the sink node to all node with \(i=3\) (our last \(\color{darkblue}a_i\)). So our network looks like:

Network representation

Note how any arc not connected to the sink or source node, goes to the right and downwards.

We have costs for visiting each node: \(\color{darkblue}a_i\cdot\color{darkblue}b_j\). As we want to formulate this as a shortest path problem, we need to allocate these costs to arcs. I used the incoming arcs for this. So any arc \(e_{i,j,i',j'}\) gets a cost \(\color{darkblue}a_{i'}\cdot\color{darkblue}b_{j'}\). The arcs to the sink node have a zero cost. As you can see: because the nodes have two indices, the arcs have four! That will be fun.

Implementation

I implemented the network model in GAMS and solved it as an LP.

The data for our tiny data set looks like:

----     15 SET i  

i1,    i2,    i3


----     15 SET j  

j1,    j2,    j3,    j4,    j5,    j6


----     15 PARAMETER a  

i1 1.000,    i2 2.000,    i3 3.000


----     15 PARAMETER b  

j1  4.000,    j2  9.000,    j3  5.000,    j4  3.000,    j5  2.000,    j6 10.000

From this, we generate our nodes and arcs:

----     31 SET n  nodes

                         j1          j2          j3          j4          j5          j6

src         YES
i1                      YES         YES         YES         YES
i2                                  YES         YES         YES         YES
i3                                              YES         YES         YES         YES
snk         YES


----     32 PARAMETER numnodes             =       14.000

----     41 SET e  arcs

src.  .i1 .j1,    src.  .i1 .j2,    src.  .i1 .j3,    src.  .i1 .j4,    i1 .j1.i2 .j2,    i1 .j1.i2 .j3
i1 .j1.i2 .j4,    i1 .j1.i2 .j5,    i1 .j2.i2 .j3,    i1 .j2.i2 .j4,    i1 .j2.i2 .j5,    i1 .j3.i2 .j4
i1 .j3.i2 .j5,    i1 .j4.i2 .j5,    i2 .j2.i3 .j3,    i2 .j2.i3 .j4,    i2 .j2.i3 .j5,    i2 .j2.i3 .j6
i2 .j3.i3 .j4,    i2 .j3.i3 .j5,    i2 .j3.i3 .j6,    i2 .j4.i3 .j5,    i2 .j4.i3 .j6,    i2 .j5.i3 .j6
i3 .j3.snk.  ,    i3 .j4.snk.  ,    i3 .j5.snk.  ,    i3 .j6.snk.  


----     42 PARAMETER numarcs              =       28.000

----     47 PARAMETER c  cost of arcs

src.  .i1.j1  4.000,    src.  .i1.j2  9.000,    src.  .i1.j3  5.000,    src.  .i1.j4  3.000,    i1 .j1.i2.j2 18.000
i1 .j1.i2.j3 10.000,    i1 .j1.i2.j4  6.000,    i1 .j1.i2.j5  4.000,    i1 .j2.i2.j3 10.000,    i1 .j2.i2.j4  6.000
i1 .j2.i2.j5  4.000,    i1 .j3.i2.j4  6.000,    i1 .j3.i2.j5  4.000,    i1 .j4.i2.j5  4.000,    i2 .j2.i3.j3 15.000
i2 .j2.i3.j4  9.000,    i2 .j2.i3.j5  6.000,    i2 .j2.i3.j6 30.000,    i2 .j3.i3.j4  9.000,    i2 .j3.i3.j5  6.000
i2 .j3.i3.j6 30.000,    i2 .j4.i3.j5  6.000,    i2 .j4.i3.j6 30.000,    i2 .j5.i3.j6 30.000

Our nodes are two-dimensional, so somewhat artificially, the source and the sink node are denoted by \(\color{darkblue}n_{'src',''}\) and \(\color{darkblue}n_{'snk',''}\). Again, the costs of visiting a node are allocated to the incoming arcs. Note that zero costs are not printed (the parameter is stored as a sparse data structure, so zero and does not exist is the same).

The LP model for this problem can look like:

Shortest Path LP Model
\[\begin{align}\min& \sum_{i,j,i',j'}\color{darkred}f_{i,j,i',j'}\cdot\color{darkblue}c_{i,j,i',j'}\\ &\sum_{i',j'\|e(i',j',i,j)} \color{darkred}f_{i',j',i,j} + \color{darkblue}g_{i,j} = \sum_{i',j'\|e(i,j,i',j')} \color{darkred}f_{i,j,i',j'} &&\forall \color{darkblue}n_{i,j}\\ & \color{darkred}f_{i,j,i',j'} \in [0,1] \end{align}\]

where \(\color{darkred}f\) is our flow variable and \(\color{darkblue}g\) is exogenous inflow. In our case:\[\color{darkblue}g_{i,j} = \begin{cases} 1 & \text{for the source node}\\ -1 & \text{for the sink node} \\ 0 & \text{for all other nodes}\end{cases}\]

The results are:

----     75 VARIABLE cost.L                =       16.000  objective

----     75 VARIABLE f.L  flow

src.  .i1 .j1 1.000,    i1 .j1.i2 .j4 1.000,    i2 .j4.i3 .j5 1.000,    i3 .j5.snk.   1.000


----     79 SET assign  assignments recovered from flows

            j1          j4          j5

i1         YES
i2                     YES
i3                                 YES

There are different ways to solve a shortest path problem. Here we look at three:

Cplex default LP solver (dual simplex)
Cplex network solver
Sparse version of Dijkstra's algorithm from scipy.sparse.csgraph.

The \(m=100, n=1000\) problem is a bit large for Cplex to handle as an LP, but let's do the \(m=50, n=500\) data set. Here are the results:

	MIP Model	Network LP Model		Dijkstra
solver	MIP	LP default	Network	Sparse
a/b length	50/500	50/500	50/500	50/500
nodes/arcs		22,552/4,995,276	22,552/4,995,276	22,552/4,995,276
rows/columns/nz	650/25,025/100,147	22,553/4,995,277/14,985,378	22,553/4,995,277/14,985,378
objective	6466.673	6466.673	6466.673	6466.673
time	59	721	total:38 presolve:32 extract network:2 solve network:3	0.1
b&b nodes	5,023
iterations	60,199	11,535	100,446

This looks better. My setup is a bit slow using unoptimized Python code, but the raw solve is very fast. When trying the \(m=100,n=1000\) data set, I got:

	MIP Model	Dijkstra
a/b length	100/1000	100/1000
variables/constraints	1,300/100,100
nodes/arcs		90,102/40,230,551
objective	14,371.455	14,371.455
time	1,853	0.9

Note that the shortest path version would result in an LP with 90k rows and 40 million columns. That is very large. So here a specialized shortest path algorithm is far superior to more general tools.

References

An assignment problem with a wrinkle, https://yetanothermathprogrammingconsultant.blogspot.com/2021/01/an-assignment-problem-with-wrinkle.html

Assignment problem with a wrinkle formulated as a network problem

Graph

Implementation

References

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...