I have a job scheduling problem with a twist- a minimization constraint. The task is- I have many jobs, each with various dependencies on other jobs, without cycles. These jobs have categories as well, and can be ran together in parallel for free if they belong to the same category. So, I want to order the jobs so that each job comes after its dependencies, but arranged in such a way that they are grouped by category (to run many in parallel) to minimize the number of serial jobs I run. That is, adjacent jobs of the same category count as a single serial job [1].It is always difficult to form a model from an informal description. For me it often helps to develop some example data. So here we go:
---- 28 SET j jobs
job1 , job2 , job3 , job4 , job5 , job6 , job7 , job8 , job9 , job10, job11, job12
job13, job14, job15, job16, job17, job18, job19, job20, job21, job22, job23, job24
job25, job26, job27, job28, job29, job30, job31, job32, job33, job34, job35, job36
job37, job38, job39, job40, job41, job42, job43, job44, job45, job46, job47, job48
job49, job50
---- 28 SET c category
cat1, cat2, cat3, cat4, cat5
---- 28 SET jc job-category mapping
cat1 cat2 cat3 cat4 cat5
job1 YES
job2 YES
job3 YES
job4 YES
job5 YES
job6 YES
job7 YES
job8 YES
job9 YES
job10 YES
job11 YES
job12 YES
job13 YES
job14 YES
job15 YES
job16 YES
job17 YES
job18 YES
job19 YES
job20 YES
job21 YES
job22 YES
job23 YES
job24 YES
job25 YES
job26 YES
job27 YES
job28 YES
job29 YES
job30 YES
job31 YES
job32 YES
job33 YES
job34 YES
job35 YES
job36 YES
job37 YES
job38 YES
job39 YES
job40 YES
job41 YES
job42 YES
job43 YES
job44 YES
job45 YES
job46 YES
job47 YES
job48 YES
job49 YES
job50 YES
---- 28 PARAMETER length job duration
job1 11.611, job2 12.558, job3 11.274, job4 7.839, job5 5.864, job6 6.025, job7 11.413
job8 10.453, job9 5.315, job10 12.924, job11 5.728, job12 6.757, job13 10.256, job14 12.502
job15 6.781, job16 5.341, job17 10.851, job18 11.212, job19 8.894, job20 8.587, job21 7.430
job22 7.464, job23 6.305, job24 14.334, job25 8.799, job26 12.834, job27 8.000, job28 6.255
job29 12.489, job30 5.692, job31 7.020, job32 5.051, job33 7.696, job34 9.999, job35 6.513
job36 6.742, job37 8.306, job38 8.169, job39 8.221, job40 14.640, job41 14.936, job42 8.699
job43 8.729, job44 12.720, job45 8.967, job46 14.131, job47 6.196, job48 12.355, job49 5.554
job50 10.763
---- 28 SET before dependencies
job3 job9 job13 job21 job23 job27 job32 job41 job42
job1 YES
job3 YES
job4 YES
job8 YES
job9 YES YES
job12 YES
job14 YES
job21 YES
job26 YES
job31 YES
+ job43 job46 job48
job10 YES YES
job11 YES
---- 28 PARAMETER due some jobs have a due date
job16 50.756, job19 57.757, job20 58.797, job25 74.443, job29 65.605, job32 55.928, job50 58.012
So we have 50 jobs and 5 categories. The set \(jc_{j,c}\) indicates if job \(j\) corresponds to category \(c\). Jobs have a duration or processing time, and some jobs have a precedence structure: the set \(\mathit{before}_{i,j}\) indicates that job \(i\) has to executed before job \(j\). In addition, some jobs have a due date.
The way I generated the due dates and the processing times suggests a continuous time model. The main variables are: \[\begin{cases} \mathit{start}_j \ge 0& \text{start of job $j$}\\ \mathit{end}_j \ge 0& \text{end of job $j$}\\ \delta_{i,j} \in \{0,1\} & \text{binary variable used to model no-overlap constraints}\end{cases}\]
As objective I use: minimize the total makespan. The idea is that this will automatically try to use as much as possible parallel jobs of the same category. We have the following constraints:
- A job finishes at start time plus job length.
- For some jobs we have a due date. This becomes a bound on the end variable in the model. In practical models you may want to allow a job to finish later than the due but at a cost. This would make it easier to generate meaningful results if not all due dates can be met.
- The precedence constraints are simple: job \(j\) can not start before job \(i\) has finished.
- No-overlap constraints require some thought. We use a binary variable to indicate that job \(i\) is executed before or after job \(j\). This has to hold for a subset of \((i,j)\) combinations: (a) only if \(i\lt j\): we don't want to check a pair twice, (b) only if there is no precedence constraint already in effect, and (c) only if jobs are of a different category. We can store the results of these three conditions in a set \(\mathit{NoOverlap}_{i,j}\) which indicates which elements \((i,j)\) need the no-overlap constraints. This set will be used in the model below.
With this in mind we can formalize this as:
Mixed Integer Programming Model |
---|
\[\begin{align} \min\> & \color{darkred}{\mathit{makespan}}\\ & \color{darkred}{\mathit{makespan}} \ge \color{darkred}{\mathit{end}}_j && \forall j\\ & \color{darkred}{\mathit{end}}_j = \color{darkred}{\mathit{start}}_j + \color{darkblue}{\mathit{length}}_j&& \forall j \\ & \color{darkred}{\mathit{end}}_j \le \color{darkblue}{\mathit{due}}_j && \forall j | \color{darkblue}{\mathit{due}}_j \text{ exists}\\ & \color{darkred}{\mathit{end}}_i \le \color{darkred}{\mathit{start}}_j && \forall i,j|\color{darkblue}{\mathit{Precedence}}_{i,j} \text{ exists}\\ & \color{darkred}{\mathit{end}}_i \le \color{darkred}{\mathit{start}}_j +\color{darkblue} M \color{darkred}\delta_{i,j} && \forall i,j|\color{darkblue}{\mathit{NoOverlap}}_{i,j} \\ &\color{darkred}{\mathit{end}}_j \le \color{darkred}{\mathit{start}}_i +\color{darkblue} M (1-\color{darkred}\delta_{i,j}) && \forall i,j|\color{darkblue}{\mathit{NoOverlap}}_{i,j} \\ & \color{darkred}{\mathit{start}}_j, \color{darkred}{\mathit{end}}_j \ge 0 \\ & \color{darkred}\delta_{i,j} \in \{0,1\}\end{align} \] |
Here \(M\) is a large enough constant, e.g. the planning window. There are many variations on this model (Constraint Programming formulations, alternative modeling to prevent the big-M constructs, such as SOS1 variables or indicator constraints), but as a first line of attack, this is not too bad.
We can substitute out \(\mathit{start}_j\) or \(\mathit{end}_j\). I would prefer to keep them both in the model to improve readability.
The model solves quickly (30 seconds on my laptop). It is not very large for modern solvers:
MODEL STATISTICS
BLOCKS OF EQUATIONS 5 SINGLE EQUATIONS 2,058
BLOCKS OF VARIABLES 4 SINGLE VARIABLES 1,073
NON ZERO ELEMENTS 6,060 DISCRETE VARIABLES 972
When we solve this we get the following results:
---- 67 VARIABLE start.L start of job
job1 36.099, job2 23.541, job3 61.714, job4 2.924, job7 87.323, job8 25.646, job9 72.989
job10 47.710, job11 23.265, job12 47.710, job13 23.265, job14 10.763, job15 36.099, job16 10.763
job17 36.859, job18 87.323, job19 10.763, job20 47.710, job21 87.323, job22 87.323, job23 78.304
job24 72.989, job25 47.710, job26 23.265, job27 94.753, job28 10.763, job29 10.776, job31 36.099
job32 47.710, job33 36.099, job34 23.265, job37 47.710, job38 10.763, job39 10.763, job40 47.710
job41 47.710, job42 36.099, job43 87.323, job44 72.989, job46 73.192, job47 10.763, job48 60.634
job49 10.763
---- 67 VARIABLE end.L end of job
job1 47.710, job2 36.099, job3 72.989, job4 10.763, job5 5.864, job6 6.025
job7 98.736, job8 36.099, job9 78.304, job10 60.634, job11 28.993, job12 54.467
job13 33.521, job14 23.265, job15 42.880, job16 16.104, job17 47.710, job18 98.535
job19 19.657, job20 56.297, job21 94.753, job22 94.787, job23 84.609, job24 87.323
job25 56.510, job26 36.099, job27 102.754, job28 17.018, job29 23.265, job30 5.692
job31 43.119, job32 52.761, job33 43.795, job34 33.264, job35 6.513, job36 6.742
job37 56.017, job38 18.932, job39 18.984, job40 62.350, job41 62.646, job42 44.798
job43 96.052, job44 85.708, job45 8.967, job46 87.323, job47 16.959, job48 72.989
job49 16.317, job50 10.763
(Some jobs have a start time of zero. They are not printed here.)
For scheduling models, this type of output is difficult to interpret by humans. Better is something like a Gantt chart:
![]() |
Results for example data set |
Indeed, we see that many jobs of the same category are executed in parallel. We have two category 1 and 2 periods. This is due to due dates and precedence constraints. (If we would not have due dates or precedence constraints, we would see just a single region for each category.) Also we see that some jobs have a little bit of freedom to float within a window. For instance job 4 could have started a bit earlier. We can fix this in post-processing or tell the model to move jobs to the left when possible (this would mean an extra term in the objective).
References
- Job scheduling with minimization by parallel grouping, https://stackoverflow.com/questions/59722383/job-scheduling-with-minimization-by-parallel-grouping