A Set Partitioning Problem and Infeasible Data Sets

In [1], a problem is stated that can be interpreted as a Set Partitioning Problem [2].

Consider the following data:

----     16 PARAMETER data  input data

            start      length        cost

job1            2547
job2           19520
job3           11238
job4            5214
job5            5840
job6            31026
job7            6368
job8           19561
job9                       1080
job10          10437
job11          22270
job12          12778
job13          22267
job14          17735
job15           1417
job16          13419
job17           1868
job18           4959
job19          14812
job20           8682

Note: zeros are not printed.

The problem is:

Find a subset of jobs that don't overlap and cover each hour 0 through 23, and that minimize cost.

This is slightly different than the data in [1], in order to make the problem feasible.

Set Partitioning Model

Before we can work on the model itself, we need to develop some derived data. We introduce the boolean parameter $\mathit{Cover}_{i,t}$ defined by: \[\mathit{Cover}_{i,t}=\begin{cases} 1 & \text{if job $i$ covers hour $t$}\\ 0 & \text{otherwise}\end{cases}\] This parameter looks like:

----     16 PARAMETER cover  coverage of hours by jobs

              h00         h01         h02         h03         h04         h05         h06         h07         h08

job1                                    11111
job4                                                                        11
job5                                                                        1111
job6                                                111111
job7                                                                                    111
job9            111111111
job15                       1111
job17                       11111111
job18                                                           11111
job20                                                                                                           1

    +         h09         h10         h11         h12         h13         h14         h15         h16         h17

job3                                    11
job5            1111
job6            1111
job9            1
job10                       1111
job12                                               111111
job14                                                                                                           1
job16                                                           1111
job18           1111
job19                                                                       1111
job20           11111

    +         h18         h19         h20         h21         h22         h23

job2                        11111
job8                        11111
job11                                                           11
job12           1
job13                                                           11
job14           111111
job19           1111

We further introduce the binary decision variables: \[x_i = \begin{cases} 1 & \text{if job $i$ is selected} \\ 0 & \text{otherwise}\end{cases}\] With this we can formulate:

Set Partitioning Model
\[\begin{align}\min & \sum_i \color{darkblue}{\mathit{Cost}}_i \cdot \color{darkred}x_i \\ & \sum_i \color{darkblue}{\mathit{Cover}}_{i,t} \cdot \color{darkred}x_i = 1 &&\forall t \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]

When we solve this problem using the above data set we get the solution:

----     31 VARIABLE x.L  selected jobs

job9  1,    job10 1,    job13 1,    job19 1


----     31 VARIABLE tcost.L               =          196  total cost

It is noted that the model can also be stated using a set representation of the coverage data. Such a model can look like:

Set Partitioning Model 2
\[\begin{align}\min & \sum_i \color{darkblue}{\mathit{Cost}}_i \cdot \color{darkred}x_i \\ & \sum_{i\|\color{darkblue}{\mathit{Cover}}(i,t)} \color{darkred}x_i = 1 &&\forall t \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]

Although this looks somewhat different, it is really exactly the same. This is actually the form I use most of the time.

Infeasibilities

It is quite easy to see data sets that cause the problem to be infeasible. Indeed, the data set shown in [1] is actually yielding a model that is not feasible.

There are two ways in which the model can be infeasible. The first is the easy one: we have no jobs at all that cover a time period $t$. Consider the data:

----     16 PARAMETER data  input data

            start      length        cost

job1            21263
job2           19585
job3           11231
job4            5870
job5            5280
job6            3437
job7            6920
job8           19555
job9                        524
job10          10589
job11          22234
job12          12236


----     16 PARAMETER cover  coverage of hours by jobs

              h00         h01         h02         h03         h04         h05         h06         h07         h08

job1                                    1111111
job4                                                                        1111
job5                                                                        11
job6                                                1111
job7                                                                                    111
job9            11111

    +         h09         h10         h11         h12         h13         h14         h19         h20         h21

job1            11111
job2                                                                                    111
job3                                    11
job4            1111
job7            111111
job8                                                                                    111
job10                       11111
job12                                               11

    +         h22         h23

job2            11
job8            11
job11           11

We see that hours 15, 16, 17, and 18 are not covered by any job. This can be easily checked in advance. Even GAMS is complaining when generating the model:

**** Exec Error at line 24: Equation infeasible due to rhs value

**** INFEASIBLE EQUATIONS ...

---- ecover  =E=  

ecover(h15)..  0 =E= 1 ; (LHS = 0, INFES = 1 ****)

ecover(h16)..  0 =E= 1 ; (LHS = 0, INFES = 1 ****)

ecover(h17)..  0 =E= 1 ; (LHS = 0, INFES = 1 ****)

ecover(h18)..  0 =E= 1 ; (LHS = 0, INFES = 1 ****)

There is another form of infeasibility that is more difficult to diagnose. When we use the data set:

----     16 PARAMETER data  input data

            start      length        cost

job1            2667
job2           19552
job3           11547
job4            5220
job5            5238
job6            3814
job7            61040
job8           19326
job9                        868
job10          101061
job11          22280
job12          12237
job13          22270
job14          17278
job15           11167
job16          13435
job17           1417
job18           4819
job19          14968

we get:

               S O L V E      S U M M A R Y

     MODEL   m                   OBJECTIVE  tcost
     TYPE    MIP                 DIRECTION  MINIMIZE
     SOLVER  CPLEX               FROM LINE  28

**** SOLVER STATUS     1 Normal Completion         
**** MODEL STATUS      10 Integer Infeasible       
**** OBJECTIVE VALUE               NA

There is not much in diagnostics for most solvers. Cplex at least reports:

Infeasibility row 'ecover(h08)':  0  = 1.

This is better than most solvers, but may be difficult to digest for an end-user. In applications, it may be better to make sure the problem is never infeasible. Basically, allow the constraint to become infeasible but at a price. This is sometimes called an elastic formulation. There are two ways to do this for this model:

use a data-driven method: introduce expensive emergency jobs used to make the constraint feasible
change the constraint directly

Data-driven Elastic model

We can add expensive filler-jobs to our data set as follows:

----     21 PARAMETER data  input data

               start      length        cost

job1               2667
job2              19552
job3              11547
job4               5220
job5               5238
job6               3814
job7               61040
job8              19326
job9                           868
job10             101061
job11             22280
job12             12237
job13             22270
job14             17278
job15              11167
job16             13435
job17              1417
job18              4819
job19             14968
notcov00                       1999
notcov01           11999
notcov02           21999
notcov03           31999
notcov04           41999
notcov05           51999
notcov06           61999
notcov07           71999
notcov08           81999
notcov09           91999
notcov10          101999
notcov11          111999
notcov12          121999
notcov13          131999
notcov14          141999
notcov15          151999
notcov16          161999
notcov17          171999
notcov18          181999
notcov19          191999
notcov20          201999
notcov21          211999
notcov22          221999
notcov23          231999

When we solve this, we see the following solution:

----     36 VARIABLE x.L  selected jobs

job12    1,    job15    1,    job19    1,    notcov00 1,    notcov23 1


----     36 VARIABLE tcost.L               =         2170  total cost

So we have trouble covering 2 hours with jobs. If we allow then to remain like that, the best solution is to use jobs 12, 15 and 19 for the other hours.

Elastic model: change constraint

Instead of changing the data, we can also do the same by changing the covering constraint:

Set Partitioning Model, elastic version
\[\begin{align}\min & \sum_i \color{darkblue}{\mathit{Cost}}_i \cdot \color{darkred}x_i + \color{darkblue}{\mathit{Penalty}} \cdot \sum_t\color{darkred}{\mathit{Uncov}}_t \\ & \sum_i \color{darkblue}{\mathit{Cover}}_{i,t} \cdot \color{darkred}x_i + \color{darkred}{\mathit{Uncov}}_t = 1 &&\forall t \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred}{\mathit{Uncov}}_t \in [0,1] \end{align}\]

You can see we added a slack variable ${\mathit{Uncov}}_t$, and included this in the objective with a penalty (999). When we run this model, we see:

----     44 VARIABLE x.L  selected jobs

job12 1,    job15 1,    job19 1


----     44 VARIABLE uncov.L  uncovered hours

h00 1,    h23 1


----     44 VARIABLE tcost.L               =         2170  total cost (objective)
            PARAMETER jobcost              =          172  cost related to running jobs
            PARAMETER penalties            =         1998  for uncovered hours

I use this type of elastic formulations a lot. This approach prevents a user from seeing "sorry model was infeasible". Instead, it gives back a meaningful result.

References

Profit maximising job scheduling using Python, https://stackoverflow.com/questions/62297792/profit-maximising-job-scheduling-using-python
Garfinkel, R. S., and G. L. Nemhauser. “The Set-Partitioning Problem: Set Covering with Equality Constraints.” Operations Research, vol. 17, no. 5, 1969, pp. 848–856.