I am revisiting here a problem from [1]:
The variable \(\color{darkred}{\mathit{floc}}_{j,c}\) (the location of facility \(j\)), is a free variable. We can restrict this a bit: \(\color{darkred}{\mathit{floc}}_{j,c} \in [0,1]\) where \(c \in \{x,y\}\). Or, even more precisely, we can do: \[\min_i \color{darkblue}{\mathit{dloc}}_{i,c} \le \color{darkred}{\mathit{floc}}_{j,c} \le \max_i \color{darkblue}{\mathit{dloc}}_{i,c} \]
This is quite a difference. The results for Model 1 with the ordering constraint are:
So, we can cover each customer within the allowed distance, using just 5 facilities. The model shows a feasible configuration for this. The results (location of facilities and assignment of customers to facilities) are not optimal in the sense that they yield the smallest distances possible between customers and facilities. We see this clearly in the purple and orange clusters. The only real thing this model delivers is: we need 5 facilities to meet the distance constraints.
If we remove the ordering constraint, we may see solutions like:
The performance of the MISOCP model is a disappointment. For now, the conclusion is: stick to minimization of the squared distances.
We have given up quite some freedom in where to place the facilities. So, we needed 7 facilities, where we needed just 5 facilities before, to meet the maximum distance constraint. The two linear MIP models we solved here were very easy: zero nodes were needed in both cases.
Indeed, we recognize that for \(\color{darkblue}{\mathit{maxDist}}=0.25\), the Medoid model needs 7 facilities. If we only want to pay for 6 facilities, we need to allow \(\color{darkblue}{\mathit{maxDist}}=0.26\).
We have \(n\) demand points and their locations. How many facilities do we need to service these customers? And where do we place them? We have a restriction: there is a maximum distance between customer and facility.
Data
We randomly generated 75 demand points inside the \([0,1]\times[0,1]\) square. The maximum allowed distance between a demand point and a facility is 0.25.
Data set
---- 63 PARAMETER maxDist = 0.250maximum distance allowed between facility and demand point (for 1x1 map) ---- 63 PARAMETER dloc demand point locations x y demand1 0.4970.964 demand2 0.8740.844 demand3 0.1910.495 demand4 0.2350.381 demand5 0.6010.151 demand6 0.5080.463 demand7 0.3590.153 demand8 0.3660.826 demand9 0.0240.059 demand10 0.2850.114 demand11 0.8900.374 demand12 0.2340.148 demand13 0.0190.078 demand14 0.9280.555 demand15 0.6160.360 demand16 0.5210.432 demand17 0.7500.817 demand18 0.0160.373 demand19 0.6970.767 demand20 0.8870.812 demand21 0.7900.804 demand22 0.0170.393 demand23 0.9600.722 demand24 0.8150.838 demand25 0.7750.528 demand26 0.6560.392 demand27 0.4850.403 demand28 0.4350.193 demand29 0.1290.488 demand30 0.7910.686 demand31 0.3036.351700E-5 demand32 0.4460.815 demand33 0.2080.336 demand34 0.2820.156 demand35 0.6200.860 demand36 0.6150.386 demand37 0.7630.732 demand38 0.0800.083 demand39 0.0190.245 demand40 0.3340.304 demand41 0.8350.386 demand42 0.0530.506 demand43 0.1920.234 demand44 0.9350.786 demand45 0.9430.172 demand46 0.1820.203 demand47 0.5320.506 demand48 0.2110.638 demand49 0.2960.553 demand50 0.0340.808 demand51 0.3550.989 demand52 0.8140.715 demand53 0.2430.542 demand54 0.5020.111 demand55 0.1690.287 demand56 0.8420.806 demand57 0.5630.157 demand58 0.2720.034 demand59 0.7180.705 demand60 0.1580.752 demand61 0.9550.785 demand62 0.2570.088 demand63 0.8000.219 demand64 0.9560.803 demand65 0.4600.202 demand66 0.8960.524 demand67 0.8720.906 demand68 0.3520.323 demand69 0.9250.634 demand70 0.4140.269 demand71 0.2820.741 demand72 0.6570.456 demand73 0.3810.496 demand74 0.9000.532 demand75 0.0600.396 min 0.0166.351700E-5 max 0.9600.989
Model 1: minimize the number of facilities
In the first model, we try to determine how many facilities we need. This is our "sizing" model. We use this in the second model, where we try to find an optimal location for these facilities. (The size of the second model is heavily determined by the number of facilities).
A high-level model looks like:
Model 1: find number of facilities needed |
---|
\[\begin{align} \min & \sum_j \color{darkred}{\mathit{isOpen}}_j \\ & \color{darkred}{\mathit{assign}}_{i,j}=1 \implies \sqrt{\sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{j,c}\right)^2} \le \color{darkblue}{\mathit{maxDist}} && \forall i,j \\ & \sum_j \color{darkred}{\mathit{assign}}_{i,j} = 1 && \forall i \\ & \color{darkred}{\mathit{isOpen}}_j =0 \implies \color{darkred}{\mathit{assign}}_{i,j} =0 && \forall i,j \\ & \color{darkred}{\mathit{isOpen}}_j\in \{0,1\} \\ & \color{darkred}{\mathit{assign}}_{i,j} \in \{0,1\} \end{align}\] |
Here \(i\) is the set of demand points, \(j\) is a set of potential facilities, and \(c=\{x,y\}\). The assignment constraint makes sure that a customer is assigned to exactly one facility. We must reformulate things a bit if we want to solve this with a standard convex MIQCP (Mixed-Integer Quadratically Constrained Program) solver. The reason is that solver developers have forgotten to implement indicator constraints (implications) for quadratic constraints.
Model 1: implementation |
---|
\[\begin{align} \min & \sum_j \color{darkred}{\mathit{isOpen}}_j \\ & \sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{j,c}\right)^2 \le \color{darkblue}{\mathit{maxDist}}^2 + \color{darkblue}M(1-\color{darkred}{\mathit{assign}}_{i,j}) && \forall i,j \\ & \sum_j \color{darkred}{\mathit{assign}}_{i,j} = 1 && \forall i \\ & \color{darkred}{\mathit{assign}}_{i,j} \le \color{darkred}{\mathit{isOpen}}_j && \forall i,j \\ & \color{darkred}{\mathit{isOpen}}_j\in \{0,1\} \\ & \color{darkred}{\mathit{assign}}_{i,j} \in \{0,1\} \end{align}\] |
The reformulation is to rewrite our quadratic implication as a big-M constraint. Here, the constant \(\color{darkblue}M\) is large enough to make the constraint non-binding when \(\color{darkred}{\mathit{assign}}_{i,j}=0\). It is an interesting exercise to see how small we can make \(\color{darkblue}M\). We also converted the second, linear implication. That one is rather trivial.
An optional constraint is \[\color{darkred}{\mathit{isOpen}}_j \ge \color{darkred}{\mathit{isOpen}}_{j+1}\] This gives nicer solutions: the open facilities are the first ones. More importantly, it is a symmetry breaker and can help performance. Does it help? An experiment showed the following performance:
version | nodes | time (secs) |
---|---|---|
without ordering constraint | 909037 | 747 |
with ordering constraint | 14947 | 9 |
With the ordering constraint, the resulting set of open facilities is:
---- 113 VARIABLE nOpen.L = 5.000number of open facilities ---- 113 SET jpossible facilities facility1 , facility2 , facility3 , facility4 , facility5 , facility6 , facility7 , facility8 , facility9 facility10 ---- 113 SET fopen facilities facility1, facility2, facility3, facility4, facility5
If we remove the ordering constraint, we may see solutions like:
---- 113 SET fopen facilities
facility1 , facility5 , facility6 , facility9 , facility10
Model 2: minimize the sum of squared distances
After obtaining the optimal number of facilities in Model 1, we can try to find an optimal placement of these facilities, and at the same time, assign customers to their closest facility.
Model 2: optimal placement of facilities |
---|
\[\begin{align} \min & \sum_{i,f|\color{darkred}{\mathit{assign}}_{i,f}=1} \sqrt{\sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{f,c}\right)^2} \\&\color{darkred}{\mathit{assign}}_{i,f}=1 \implies \sqrt{\sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{f,c}\right)^2} \le\color{darkblue}{\mathit{maxDist}} \\ & \sum_f \color{darkred}{\mathit{assign}}_{i,f} = 1 && \forall i \\ & \color{darkred}{\mathit{assign}}_{i,f} \in \{0,1\} \end{align}\] |
Here \(i\) are the demand points, and \(f\) are the (needed) facilities. In the previous model, we used \(j\) as the set of potential facilities. Here we use \(f\), which is a subset of \(j\). In the previous model, we used 10 potential facilities (so \(\color{darkred}{\mathit{assign}}_{i,j}\) was \(75 \times 10\)). In this model we have \(\color{darkred}{\mathit{assign}}_{i,f}\) which is \(75 \times 5\).
In this second model, we would like to minimize the sum of the distances between customers and facilities. It is much more convenient to change this slightly to minimize the sum of squared distances, i.e., we drop the square root in the distance calculation. We did the same thing in the previous model. There, the results are identical. Here, we really have a somewhat different model that will deliver, in general, slightly different solutions.
Here is how we can implement the model, using our new objective:
Model 2: implementation |
---|
\[\begin{align} \min & \sum_{i,f} \color{darkred}d_{i,f} \\ & \color{darkred}d_{i,f} \ge \sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{f,c}\right)^2 - \color{darkblue}M(1-\color{darkred}{\mathit{assign}}_{i,f}) && \forall i,f \\ & \sum_f \color{darkred}{\mathit{assign}}_{i,f} = 1 && \forall i \\ & \color{darkred}d_{i,f}\in [0,\color{darkblue}{\mathit{maxDist}}^2] \\ & \color{darkred}{\mathit{assign}}_{i,f} \in \{0,1\} \end{align}\] |
We can add the following symmetry breaker: \[\color{darkred}{\mathit{floc}}_{f,x} \le \color{darkred}{\mathit{floc}}_{f+1,x}\] i.e., order the facilities by their \(x\)-coordinate. Does this help? A test run shows:
Indeed, we see a better assignment. This is evidenced by a decrease in the sum of the squared distances between customers and facilities from 2.56 to 2.18.
In the previous section, we solved the "min sum of squared distances" problem. Of course, we can change just the objective function in model 2 to \[\min \sum_{i,f} \sqrt{\color{darkred}d_{i,f}}\] However, to find global solutions, we would need a global MINLP solver. There is a way to shoehorn this model into a convex solver, using a second-order cone (SOCP) constraint [2]. Such a constraint is usually specified as \[y^2 \ge x^TQx\] where \(y\) is a non-negative variable, and \(Q\) is a positive semi-definite matrix.
version | nodes | time (secs) |
---|---|---|
without ordering constraint | 94479 | 13 |
with ordering constraint | 6950 | 2.2 |
Again, nothing to sneeze at. The results for our Model 2 (with ordering constraint) are:
Indeed, we see a better assignment. This is evidenced by a decrease in the sum of the squared distances between customers and facilities from 2.56 to 2.18.
When the ordering constraint is turned on, we see that the facilities are nicely ordered by their \(x\)-coordinate:
---- 181 PARAMETER floc2 model2 results x y facility1 0.1850.209 facility2 0.2820.839 facility3 0.2990.458 facility4 0.7060.248 facility5 0.8370.736
Notes:
- We have separated here the problem of finding the number of facilities (Model 1) and the optimal placement of them (Model 2). We can look at the combined problem as a multi-objective model. The simplest approach would be a weighted-objective implementation, with a relatively large weight assigned to the number of facilities. The advantage of this approach is that we only need to solve one model. However, this model is larger than Model 2, as we need to overestimate the number of facilities required (just as in Model 1).
- The pictures prompt us to consider \(k\)-means clustering [2]. A fast and straightforward heuristic is Lloyd's algorithm. This problem is also about minimizing the sum of the squared distances. The problem is that I don't know how to incorporate the maximum distance constraint. We could think: well, by minimizing the sum of squares, we'll see an automatic reduction of the largest distance. If we run R's kmeanson our data as is, we see:
clusters: 5 tot.withinss: 1.949382 maxdist: 0.3338258
clusters: 6 tot.withinss: 1.528166 maxdist: 0.2735675
clusters: 7 tot.withinss: 1.268441 maxdist: 0.2735675
clusters: 8 tot.withinss: 1.057716 maxdist: 0.2498851
This would allocate 8 clusters before the maximum distance is below 0.25. The resemblance between our problem and \(k\)-means is misleading.
Model 3: minimize the sum of distances
It is not completely trivial to convert the previous model into a SOCP framework. Here is my attempt:
Model 3: MISOCP model implementation |
---|
\[\begin{align} \min & \sum_{i,f} \color{darkred}d_{i,f} \\ & \color{darkred}d_{i,f} \ge \color{darkred}\Delta_{i,f} - \color{darkblue}M(1-\color{darkred}{\mathit{assign}}_{i,f}) && \forall i,f \\& \color{darkred}\Delta^2_{i,f} \ge \sum_c \color{darkred}\delta^2_{i,f,c} && \forall i,f \\ & \color{darkred}\delta_{i,f,c} = \color{darkblue}{\mathit{dloc}}_{i,c}-\color{darkred}{\mathit{floc}}_{f,c} && \forall i,f,c\\ & \sum_f \color{darkred}{\mathit{assign}}_{i,f} = 1 && \forall i \\& \color{darkred}{\mathit{assign}}_{i,f} \in \{0,1\} \\ & \color{darkred}d_{i,f},\color{darkred}\Delta_{i,f} \ge 0 \end{align}\] |
Obviously, this model has a boatload of extra variables and equations compared to our previous models. We can add some bounds, and of course, our ordering constraint. This model seems to perform much, much worse than model 2. We are talking about orders of magnitude slower. I did not have the patience to solve the problem using the \(n=75\) data set.
To illustrate that indeed, model 2 and model 3 yield different results, I compared the results using a much smaller \(n=25\) data set.
Test with \(n=25\) | Sum squared distances | Sum distances |
---|---|---|
model 2: min sum squared distances | 0.700 (min) | 3.864 |
model 3: min sum distances | 0.838 | 3.660 (min) |
Model 4: a Medoids model
An alternative to let the model place facilities anywhere on the map, we can select a (possibly large) number of candidate points. This allows us to calculate distances (or squared distances) in advance. The resulting model is a linear MIP model, which are easier (and more reliably) to solve.
A special case is when we choose the demand points as the candidate locations for the facilities. In clustering, this is called \(k\)-medoids [3]. Of course, we have the additional side constraint that no distance between demand point and facility can be more than \(\color{darkblue}{\mathit{maxDist}}\). A multi-objective version (min number of facilities, best facility location/best assignment) can look like:
Model 4: medoids model implementation |
---|
\[\begin{align} \min\> & \color{darkblue}w_1\cdot\color{darkred}{\mathit{totDist}} +\color{darkblue}w_2\cdot\color{darkred}{\mathit{numFacs}} \\ & \color{darkred}{\mathit{totDist}} = \sum_{(i,i')\in S}\color{darkblue}{\mathit{dist}}_{i,i'} \cdot \color{darkred}{\mathit{assign}}_{i,i'} \\ & \color{darkred}{\mathit{numFacs}} = \sum_i \color{darkred}{\mathit{facSelect}}_i \\ & \sum_{i'|(i,i')\in S} \color{darkred}{\mathit{assign}}_{i,i'} =1 && \forall i \\& \color{darkred}{\mathit{assign}}_{i,i'} \le \color{darkred}{\mathit{facSelect}}_{i'} && \forall (i,i')\in S \\ & \color{darkred}{\mathit{assign}}_{i,i'}, \color{darkred}{\mathit{facSelect}}_i \in \{0,1\} \end{align}\] |
Here, \(S\) is the subset of allowed assignments \(i\rightarrow i'\) (i.e., with distance less than \(\color{darkblue}{\mathit{maxDist}}\)). The parameter \(\color{darkblue}{\mathit{dist}}_{i,i'}\) is the distance measure used (it can be the squared distance or the actual distance, or any other measure).
We can run this model in one swoop (with \(\color{darkblue}w_2 \gg \color{darkblue}w_1\)), or in two steps:
- Solve with \(\color{darkblue}w_1=0,\color{darkblue}w_2=1\)
- Fix \(\color{darkred}{\mathit{numFacs}}\) and solve with \(\color{darkblue}w_1=1,\color{darkblue}w_2=0\)
The results are as follows:
We have given up quite some freedom in where to place the facilities. So, we needed 7 facilities, where we needed just 5 facilities before, to meet the maximum distance constraint. The two linear MIP models we solved here were very easy: zero nodes were needed in both cases.
There are good heuristics for the \(k\)-Medoid clustering problem (e.g. PAM: Partitioning Around Medoids). However, they don't support a max-distance restriction.
Model 5: Trade-off: Max distance vs Number of facilities
If someone would have asked me to work on a problem like this, I would not stop at just providing the precise solutions I was asked about. I would always try to take a step back, and investigate the underlying mechanisms. In this case, for instance, we have a tension between the number of facilities we can open and the maximum distance constraint. It is interesting to trace out the trade-off between the maximum distance we can support and the number of facilities we need to build. There are two ways to research this:
- Vary the max distance and observe the resulting number of facilities needed, or
- vary the number of facilities and see what max distance we can support with that.
The max distance is a real number, so varying it can be a bit costly (depending on the resolution we choose, e.g. 0.1 units). It is easier to loop over the number of facilities, as this is an integer. The model we use here is:
Model 5: maxDist vs numFacs |
---|
\[\begin{align} \min \> & \color{darkred}{\mathit{maxDist}} \\ & \color{darkred}{\mathit{assign}}_{i,i'}\cdot \color{darkblue}{\mathit{dist}}_{i,i'} \le \color{darkred}{\mathit{maxDist}}&& \forall i,i'\\ & \color{darkblue}{\mathit{numFacs}} = \sum_i \color{darkred}{\mathit{facSelect}}_i \\ & \sum_{i'} \color{darkred}{\mathit{assign}}_{i,i'} =1 && \forall i \\& \color{darkred}{\mathit{assign}}_{i,i'} \le \color{darkred}{\mathit{facSelect}}_{i'} && \forall i,i' \\ & \color{darkred}{\mathit{assign}}_{i,i'}, \color{darkred}{\mathit{facSelect}}_i \in \{0,1\} \end{align}\] |
We run this model for \(\color{darkblue}{\mathit{numFacs}}=1,\dots,10\). The result is:
---- 344 PARAMETER m5results numFacs MaxDist time nodes k1 1.0000.6220.969 k2 2.0000.5212.625 k3 3.0000.3754.407 k4 4.0000.3374.547 k5 5.0000.2914.56376.000 k6 6.0000.2604.250 k7 7.0000.2453.985 k8 8.0000.2104.015 k9 9.0000.2083.657 k10 10.0000.1904.375
Note the comment by Rob Pratt: the first constraint can be strengthened by \[\sum_{i'} \color{darkred}{\mathit{assign}}_{i,i'}\cdot \color{darkblue}{\mathit{dist}}_{i,i'} \le \color{darkred}{\mathit{maxDist}} \] When we do this, all ten sub-problems are solved with zero nodes.
A picture of the results makes the trade-off between the two objectives clearer:
Indeed, we recognize that for \(\color{darkblue}{\mathit{maxDist}}=0.25\), the Medoid model needs 7 facilities. If we only want to pay for 6 facilities, we need to allow \(\color{darkblue}{\mathit{maxDist}}=0.26\).
Note that the lines are a bit misleading here: only the discrete points are feasible. I keep the lines as they illuminate the shape of the curve. We see that on the left, the gradient is larger (i.e. the impact of an extra facility (or removing a facility) than on the right. This is obvious (the marginal impact of a facility is more is there are fewer of them), but it is always good to make this more explicit with this picture.
I often find these types of pictures add value to the analyses. If you present this to a client, it can lead to good discussions. Clients typically know this behavior, but making it explicit is a big help.
GAMS Model
The GAMS code has all the above models. By default, we skip model 3 (the MISOCP model) as it solves too slowly. By default, generating the HTML output is turned on.
GAMS Model
$onText
Continuous Facility Location
Model 1:
Find number of facilities needed to service all customers with
constraint on the distance between facility and customer.
Model 2:
Given the number of facilities found in model 1, find an optimal
location of these facilities (by minimizing the sum of squared
distances) and the optimal assignment of customers to
facilities.
Model 3:
We can minimize the distances using a MISOCP (Mixed-Integer
Second Order Cone program). This takes forever for n=75
so we skip this here.
Model 4:
This is a Medoid version: the candidate locations are the
demand points. This is formulated as a multi-objective
model, but we can run the objectives (min facilities,
min sum distances) separately by changing the weights.
Model 5:
Produce efficient frontier between maxDist and numFacs
based on Medoid model 4.
Reporting is done using HTML + Plotly.
$offText
optionreslim=1000;
optionmiqcp=cplex;
optionseed=12345;
* third model (MISOCP) is very expensive
* it is better to skip this
* runmodel3=0 : skip model 3
* runmodel3<>0 : run model 3
$set runmodel3 0
* enable (1) or disable (0) symmetry breakers
$set symm 1
* set to 0 if no HTML report
$set runhtml 1
*-----------------------------------------------------------------------------------------
* data
*-----------------------------------------------------------------------------------------
Sets
dummy 'for ordering of displays' /numFacs,MaxDist,time/
i 'demand points' /demand1*demand75/
j 'possible facilities' /facility1*facility10/
c 'coordinates' /x,y/
;
Parameters
dloc(*,c) 'demand point locations'
maxDist 'maximum distance allowed between facility and demand point (for 1x1 map)' /0.25/
wh 'width and height of our region' /1/
symm "0:don't use 1:use symmetry breakers" /%symm%/
;
maxDist = maxDist*wh;
dloc(i,c) = uniform(0,wh);
dloc('min',c) = smin(i,dloc(i,c));
dloc('max',c) = smax(i,dloc(i,c));
display maxDist,dloc;
*-----------------------------------------------------------------------------------------
* model 1: find minimum number of facilities needed
* we put some effort into finding small big-M values
* this is a bit of overkill
*-----------------------------------------------------------------------------------------
*
* for proper big M calculation we need to know farthest possible distances
*
sets
LU 'lower or upper' /L,U/
b(LU,LU) 'box' /L.L, L.U, U.L, U.U/
;
alias (LU,LUx,LUy);
Parameters
corners(LU,LU,c) 'corners of box for facility locations'
farthest(i) 'max possible squared distance between demand point i and possible location of facility'
;
corners('L',LUy,'x') = dloc('min','x');
corners('U',LUy,'x') = dloc('max','x');
corners(LUx,'L','y') = dloc('min','y');
corners(LUx,'U','y') = dloc('max','y');
display corners;
farthest(i) = smax(b,sum(c,sqr(dloc(i,c)-corners(b,c))));
display farthest;
*
* model 1: MIQCP
*
variables
floc(j,c) 'facility locations'
isOpen(j) 'facility is being used'
assign(i,j) 'assign customers to facility'
nOpen 'number of open facilities'
;
binary variables isOpen,assign;
equations
distance(i,j) 'squared distance equation'
assignDemand(i) 'assign customer to exactly one facility'
closed(i,j) 'do not assign customers to closed facilties'
numFacilities 'number of open facilities'
order(j) 'optional: open facilities are first ones'
;
distance(i,j)..sum(c, sqr(dloc(i,c)-floc(j,c))) =l= sqr(maxDist)*assign(i,j) + farthest(i)*(1-assign(i,j));
assignDemand(i)..sum(j, assign(i,j)) =e= 1;
closed(i,j).. assign(i,j) =l= isOpen(j);
numFacilities.. nOpen =e= sum(j, isOpen(j));
order(j+1)$symm.. isOpen(j) =g= isOpen(j+1);
* facility locations should be inside the box formed by the demand points
floc.lo(j,c) = smin(b,corners(b,c));
floc.up(j,c) = smax(b,corners(b,c));
*
* solve
*
model m1 /all/;
solve m1 minimizing nOpen usingmiqcp;
abort$(m1.modelstat <> %modelStat.optimal% and m1.modelstat <> %modelStat.integerSolution%) "No solution";
*
* collect results
*
set f(j) 'open facilities';
f(j) = isOpen.l(j)>0.5;
display nOpen.l,j,f;
parameter res1(*) 'results model 1';
res1('facilities needed (min)') = round(nOpen.l);
res1('sum squared distances') = sum((i,j)$(assign.l(i,j)>0.5),sum(c, sqr(dloc(i,c)-floc.l(j,c))));
res1('max squared distance') = smax((i,j)$(assign.l(i,j)>0.5),sum(c, sqr(dloc(i,c)-floc.l(j,c))));
res1('sum distances') = sum((i,j)$(assign.l(i,j)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(j,c)))));
res1('max distance') = smax((i,j)$(assign.l(i,j)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(j,c)))));
res1('solver time') = m1.resusd;
res1('nodes') = m1.nodusd;
res1('binary variables') = m1.numdvar;
display res1;
set assign1(i,j) 'model1 results';
assign1(i,j) = assign.l(i,j)>0.5;
parameter floc1(j,c) 'model1 results';
floc1(f,c) = floc.l(f,c);
*-----------------------------------------------------------------------------------------
* model 2: find optimal assignment of customer to open facilities
* minimize sum of squared distances
*-----------------------------------------------------------------------------------------
positive variable d2(i,j) 'squared distance between customer and facility';
variable totdist2 'sum of squared distances';
equations
distance2(i,j) 'squared distance equation'
assignDemandf(i) 'assign customer to exactly one facility'
objective2 'minimize sum of squared distances'
orderx(j) 'order by x coordinate'
;
objective2.. totdist2 =e= sum((i,f),d2(i,f));
distance2(i,f).. d2(i,f) =g= sum(c, sqr(dloc(i,c)-floc(f,c))) - farthest(i)*(1-assign(i,f));
assignDemandf(i)..sum(f, assign(i,f)) =e= 1;
orderx(j+1)$(f(j) and symm).. floc(j,'x') =l= floc(j+1,'x');
d2.up(i,f) = sqr(maxDist);
model m2 /objective2,distance2,assignDemandf,orderx/;
solve m2 minimizing totdist2 usingmiqcp;
abort$(m2.modelstat <> %modelStat.optimal% and m2.modelstat <> %modelStat.integerSolution%) "No solution";
display totdist2.l, assign.l
parameter res2(*) 'results model 2';
res2('sum squared distances (min)') = sum((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res2('max squared distance') = smax((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res2('sum distances') = sum((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res2('max distance') = smax((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res2('solver time') = m2.resusd;
res2('nodes') = m2.nodusd;
res2('binary variables') = m2.numdvar;
display res2;
set assign2(i,j) 'model2 results';
assign2(i,f) = assign.l(i,f)>0.5;
parameter floc2(j,c) 'model2 results';
floc2(f,c) = floc.l(f,c);
*-----------------------------------------------------------------------------------------
* model 3: find optimal assignment of customer to open facilities
* minimize sum of distances
* MISOCP formulation
* this is too slow for larger data sets
*-----------------------------------------------------------------------------------------
$if%runmodel3%==0 $goto skipmodel3
positive variable
dall(i,j) 'distance between all customers and facilities'
d(i,j) 'distance between assigned customers and facilities or 0'
;
free variables
totdist 'sum of distances'
diff(i,j,c) 'facility - customer'
;
equations
objective 'minimize sum of distances'
socp(i,j) 'second order cone constraint'
ediff(i,j,c) 'difference coordinatewise'
calcd(i,j) 'big-M version of implication'
;
objective.. totdist =e= sum((i,f),d(i,f));
ediff(i,f,c).. diff(i,f,c) =e= dloc(i,c)-floc(f,c);
socp(i,f)..sqr(dall(i,f)) =g= sum(c,sqr(diff(i,f,c)));
calcd(i,f).. d(i,f) =g= dall(i,f) - sqrt(farthest(i))*(1-assign(i,f));
dall.up(i,f) = sqrt(farthest(i));
d.up(i,f) = sqrt(farthest(i));
model m3 /objective,ediff,socp,calcd,assignDemandf,orderx/;
m3.optfile=1;
solve m3 minimizing totdist usingmiqcp;
abort$(m3.modelstat <> %modelStat.optimal% and m3.modelstat <> %modelStat.integerSolution%) "No solution";
display totdist.l, assign.l
parameter res3(*) 'results model 3';
res3('sum squared distances') = sum((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res3('max squared distance') = smax((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res3('sum distances (min)') = sum((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res3('max distance') = smax((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res3('solver time') = m3.resusd;
res3('nodes') = m3.nodusd;
res3('binary variables') = m3.numdvar;
display res3;
set assign3(i,j) 'model3 results';
assign3(i,f) = assign.l(i,f)>0.5;
parameter floc3(j,c) 'model3 results';
floc3(f,c) = floc.l(f,c);
$onecho> cplex.opt
mipstart 1
mipstrategy 4
$offecho
$label skipmodel3
*-----------------------------------------------------------------------------------------
* Model 4: k-mediods model
*-----------------------------------------------------------------------------------------
alias (i,ii);
parameter dist(i,ii) 'any distance measure, here the euclidean distance';
dist(i,ii) = sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c))));
scalars
w1 'obj weight: distance'
w2 'obj weight: number of facilities'
;
binary variables
facSelect(i) 'select point i as facility'
assigni(i,ii) 'assign demand point i to facility ii'
;
positive variables
totDist 'obj1: sum of distances'
numFacs 'obj2: number of facilities'
;
variable z 'objective';
Equations
objMultiple 'weighted sum objective'
objDist 'obj1: sum of distances'
objNumFacs 'obj2: number of facilities'
eAssign(i) 'each customer must be assigned to one facility'
close(i,i) 'if point i is not a facility, then we can not serve customers from there'
;
set ok(i,ii) 'allowed assignments';
ok(i,ii) = sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c)))) <= maxDist;
* bi-objective
objMultiple.. z =e= w1*totDist+w2*numFacs;
objDist.. totDist =e= sum(ok(i,ii),dist(ok)*assigni(ok));
objNumFacs.. numFacs =e= sum(i,facSelect(i));
* constraints
eAssign(i)..sum(ok(i,ii),assigni(ok)) =e= 1;
close(ok(i,ii)).. assigni(ok) =l= facSelect(ii);
model m4 /objMultiple,objDist,objNumFacs,eAssign,close/;
parameter res4(*) 'results model 4';
* we solve in two phases:
* 1. minimize number of facilities needed
* 2. fix numFacs and minimize sum of distances
w1 = 0; w2 = 1;
solve m4 minimizing z usingmip;
display numfacs.l;
res4('solver time (min numFacs)') = m4.resusd;
res4('nodes (min numFacs)') = m4.nodusd;
res4('binary variables (min numFacs)') = m4.numdvar;
res4('number of facilities') = round(numFacs.l);
numfacs.fx = round(numfacs.l);
w1 = 1; w2 = 0;
solve m4 minimizing z usingmip;
res4('solver time (min totDist)') = m4.resusd;
res4('nodes (min totDist)') = m4.nodusd;
res4('binary variables (min totDist)') = m4.numdvar;
res4('sum squared distances') = sum(ok(i,ii)$(assigni.l(i,ii)>0.5),sum(c, sqr(dloc(i,c)-dloc(ii,c))));
res4('max squared distance') = smax(ok(i,ii)$(assigni.l(i,ii)>0.5),sum(c, sqr(dloc(i,c)-dloc(ii,c))));
res4('sum distances (min)') = sum(ok(i,ii)$(assigni.l(i,ii)>0.5),sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c)))));
res4('max distance') = smax(ok(i,ii)$(assigni.l(i,ii)>0.5),sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c)))));
display res4;
set assign4(i,i) 'model4 results';
assign4(ok) = assigni.l(ok)>0.5;
set facSelected(i) 'facilities selected';
facSelected(i) = facSelect.l(i) > 0.5;
display facSelected;
parameter ordFac(i) 'numbering for coloring';
ordFac(FacSelected) = facSelected.pos;
display ordFac;
*-----------------------------------------------------------------------------------------
* Model 5: trace trade-off between max distance and number of facilities
*-----------------------------------------------------------------------------------------
variable zmaxdist 'objective: minimize maxdist';
ok(i,ii) = yes;
equations
maxDistance(i,ii) 'assign(i,ii)=1 ==> dist(i,ii) <= maxdist'
;
scalar fxNumFacs 'fixed number of facilities';
maxDistance(i,ii).. assigni(i,ii)*dist(i,ii) =l= zmaxdist;
zmaxDist.lo = 0;
model m5 /maxDistance,objNumFacs,eAssign,close/;
set k /k1*k10/;
parameter m5results(k,*);
loop (k,
numFacs.fx = ord(k);
solve m5 minimizing zmaxdist usingmip;
abort$(m5.modelstat <> %modelStat.optimal% and m5.modelstat <> %modelStat.integerSolution%) "No solution";
m5results(k,'numFacs') = ord(k);
m5results(k,'maxDist') = zmaxdist.l;
m5results(k,'time') = m5.resusd;
m5results(k,'nodes') = m5.nodusd;
);
display m5results;
*-----------------------------------------------------------------------------------------
* reporting and visualization (models 1, 2 and 4)
*-----------------------------------------------------------------------------------------
$if%runhtml%==0 $stop
$set htmlfile report.html
$set datafile data.js
$macro tablerow(txt,num) '<tr><td>txt</td><td align="right"><pre>',num,'</pre></td></tr>'/;
$macro tableheaderrow(txt1,txt2) '<tr><th>txt1</th><th>txt2</th></tr>'/;
$macro tablerow2(num1,num2) '<tr><td align="right"><pre>',num1,'</td><td align="right"><pre>',num2,'</pre></td></tr>'/;
file fdata /%datafile%/;
put fdata;
* demand points
put"datatable=`"/;
put'<table>'/;
put tablerow(Demand points,card(i):0:0)
put tablerow(Max distance customer → facility,maxDist:7:3)
put'</table>'/;
put"`;"/;
put"points=["/;
loop(i,
put"{x:",dloc(i,'x'):6:4,",y:",dloc(i,'y'):6:4,"},"/;
);
put"];"/;
* model 1
put"m1table=`"/;
put'<table>'/;
put tablerow(Number facilities needed (min),res1('facilities needed (min)'):0:0)
put tablerow(Sum distances,res1('sum distances'):8:3)
put tablerow(Sum squared distances,res1('sum squared distances'):8:3)
put tablerow(Max distance,res1('max distance'):8:3)
put tablerow(Binary variables,res1('binary variables'):0:0);
put tablerow(Solver time,res1('solver time'):8:3)
put tablerow(Nodes,res1('nodes'):0:0)
put'</table>'/;
put"`;"/;
put"floc1=["/;
loop(f,
put"{x:",floc1(f,'x'):8:4,",y:",floc1(f,'y'):8:4,"},"/;
);
put"];"/;
put"assign1=["/;
loop(assign1(i,f),
put"{i:",ord(i):0:0,",f:",f.pos:0:0,"},"/;
);
put"];"/;
* model 2
put"m2table=`"/;
put'<table>'/;
put tablerow(Sum squared distances (min),res2('sum squared distances (min)'):8:3)
put tablerow(Sum distances,res2('sum distances'):8:3)
put tablerow(Max distance,res2('max distance'):8:3)
put tablerow(Binary variables,res2('binary variables'):0:0)
put tablerow(Solver time,res2('solver time'):8:3)
put tablerow(Nodes,res2('nodes'):0:0)
put'</table>'/;
put"`;"/;
put"floc2=["/;
loop(f,
put"{x:",floc2(f,'x'):6:4,",y:",floc2(f,'y'):6:4,"},"/;
);
put"];"/;
put"assign2=["/;
loop(assign2(i,f),
put"{i:",ord(i):0:0,",f:",f.pos:0:0,"},"/;
);
put"];"/;
* model 4
put"m4table=`"/;
put'<table>'/;
put tablerow(Number of facilities,res4('number of facilities'):0:0)
put tablerow(Sum squared distances,res4('sum squared distances'):8:3)
put tablerow(Sum distances (min),res4('sum distances (min)'):8:3)
put tablerow(Max distance,res4('max distance'):8:3)
put tablerow(Binary variables (min numFacs),res4('binary variables (min numFacs)'):0:0)
put tablerow(Solver time (min numFacs),res4('solver time (min numFacs)'):8:3)
put tablerow(Nodes (min numFacs),res4('nodes (min numFacs)'):0:0)
put tablerow(Binary variables (min totDist),res4('binary variables (min totDist)'):0:0)
put tablerow(Solver time (min totDist),res4('solver time (min totDist)'):8:3)
put tablerow(Nodes (min totDist),res4('nodes (min totDist)'):0:0)
put'</table>'/;
put"`;"/;
put"assign4=["/;
loop(assign4(i,ii),
put"{i:",ord(i):0:0,",ii:",ord(ii):0:0,",cl:",ordFac(ii):0:0,"},"/;
);
put"];"/;
* model 5
put"frontier=["/;
loop(k,
put"{maxdist:",m5results(k,'maxDist'):8:3,",numfacs:",m5results(k,'numFacs'):8:3,"},"/;
);
put"];"/;
put"m5table=`"/;
put'<table>'/;
put tableheaderrow(numFacs,maxDist)
loop(k,
put tablerow2(m5results(k,'numFacs'):8:3,m5results(k,'maxDist'):8:3)
);
put'</table>'/;
put"`;"/;
putclose;
$onecho> %htmlfile%
<html>
<script src="https://cdn.plot.ly/plotly-3.0.1.min.js" charset="utf-8"></script>
<script src="%datafile%" charset="utf-8"></script>
<style>
table,th, td {
border: 1px solid black;
border-collapse: collapse;
padding-left: 10px;
padding-right: 10px;
}
p { max-width:800px; }
</style>
<body>
<h1>Facility Location Model</h1>
<h2>Data: demand points</h2>
<p>The location of the demand points are randomly generated and drawn from the uniform distribution.</p>
<div id="dataTable"></div>
<div id="myPlot1" style="width:100%;max-width:700px;height:700px"></div>
<p>
<h2>Model 1: results</h2>
<p>Model 1 finds the minimum number of facilities needed to serve all customers and
obey the maximum distance constraint. To make the model quadratic, the constraint is
formulated as a maximum quadratic distance constraint. It is noted that the solution
is <b>not</b> an assignment with shortest distances between customers and facilities.</p>
<div id="m1Table"></div>
<div id="myPlot2" style="width:100%;max-width:700px;height:700px"></div>
<h2>Model 2: results</h2>
<p>Model 2 find the best location of the facilities and the optimal assignment of
customers to the facilities. It uses the number of facilities found in model 1.</p>
<div id="m2Table"></div>
<div id="myPlot3" style="width:100%;max-width:700px;height:700px"></div>
<h2>Model 4: results</h2>
<p>Model 4 uses the demand points as candidate locations for the facilities. This
is an easy MIP. It is solved here in two stages: first find the optimal number of
facilities and then find the best locations.</p>
<div id="m4Table"></div>
<div id="myPlot4" style="width:100%;max-width:700px;height:700px"></div>
<h2>Model 5: results</h2>
<p>Model 5 is tracing the trade-off between <span style="font-family: courier;">maxdist</span> (the maximum distance
limit) and <span style="font-family: courier;">numfacs</span> (the number of facilities). The results are based on
Medoid based model: the candidate locations of the facilities is the set of demand
points.</p>
<div id="m5Table"></div>
<div id="myPlot5" style="width:100%;max-width:700px;height:700px"></div>
<script>
colors = [
'#1f77b4', // muted blue
'#ff7f0e', // safety orange
'#2ca02c', // cooked asparagus green
'#d62728', // brick red
'#9467bd', // muted purple
'#8c564b', // chestnut brown
'#e377c2', // raspberry yogurt pink
'#7f7f7f', // middle gray
'#bcbd22', // curry yellow-green
'#17becf' // blue-teal
];
document.getElementById('dataTable').innerHTML = datatable;
document.getElementById('m1Table').innerHTML = m1table;
document.getElementById('m2Table').innerHTML = m2table;
document.getElementById('m4Table').innerHTML = m4table;
document.getElementById('m5Table').innerHTML = m5table;
// extract coordinates as arrays
xpoints = points.map(({x})=>x);
ypoints = points.map(({y})=>y);
xfloc1 = floc1.map(({x})=>x);
yfloc1 = floc1.map(({y})=>y);
xfloc2 = floc2.map(({x})=>x);
yfloc2 = floc2.map(({y})=>y);
numfacs = frontier.map(({numfacs})=>numfacs);
maxdist = frontier.map(({maxdist})=>maxdist);
trace1 = {
x: xpoints,
y: ypoints,
mode: 'markers',
type: 'scatter',
marker: { color: 'black' }
};
trace2 = {
x: xfloc1,
y: yfloc1,
mode: 'markers',
type: 'scatter',
marker: { color: colors },
};
trace3 = {
x: xfloc2,
y: yfloc2,
mode: 'markers',
type: 'scatter',
marker: { color: colors },
};
trace4 = {
x: numfacs,
y: maxdist,
};
assignments = [];
for (k=0; k < assign1.length; ++k) {
item = assign1[k];
i = item['i']-1;
f = item['f']-1;
asg = {
type:'line',
x0:xpoints[i],
y0:ypoints[i],
x1:xfloc1[f],
y1:yfloc1[f],
line: { color:colors[f],width:2 }
}
assignments.push(asg);
}
var layout2 = {showlegend: false, shapes:assignments};
assignments2 = [];
for (k=0; k < assign2.length; ++k) {
item = assign2[k];
i = item['i']-1;
f = item['f']-1;
ff = f % colors.length;
asg = {
type:'line',
x0:xpoints[i],
y0:ypoints[i],
x1:xfloc2[f],
y1:yfloc2[f],
line: { color:colors[ff],width:2 }
}
assignments2.push(asg);
}
var layout3 = {showlegend: false, shapes:assignments2};
assignments4 = [];
for (k=0; k < assign4.length; ++k) {
item = assign4[k];
i = item['i']-1;
ii = item['ii']-1;
cl = item['cl']-1;
ff = cl % colors.length;
asg = {
type:'line',
x0:xpoints[i],
y0:ypoints[i],
x1:xpoints[ii],
y1:ypoints[ii],
line: { color:colors[ff],width:2 }
}
assignments4.push(asg);
}
var layout4 = {showlegend: false, shapes:assignments4};
var layout5 = {
xaxis : {title:{text:'number of facilities'}},
yaxis : {title:{text:'max distance'}},
}
Plotly.newPlot('myPlot1', [trace1]);
Plotly.newPlot('myPlot2', [trace1,trace2], layout2);
Plotly.newPlot('myPlot3', [trace1,trace3], layout3);
Plotly.newPlot('myPlot4', [trace1], layout4);
Plotly.newPlot('myPlot5', [trace4], layout5);
</script>
</body>
</html>
$offEcho
executetool 'win32.ShellExecute "%htmlfile%"';
Output
Conclusion
This looks like an easy problem, but alternative formulations make a difference.
- Symmetry-breaking constraints can really help.
- Minimizing the sum of squared distances is much easier than minimizing the sum of Euclidean distances (the last measure, modeled with an MISOCP model, performs very poorly).
- Easy linear MIP models can be formulated if we only restrict the placement of the facilities to a set of candidate locations. A special case is: facilities can only be placed at demand points.
- Our problems superficially resemble clustering problems, but clustering algorithms typically don't handle our max distance side constraint.
- Reporting (including visualization) is very important: both for the model developer (e.g. for model debugging) and the client. For a client, the solution report is the GUI of the model: it is the only visible part. For a model developer, it is often an afterthought. That is not the best way to think about it.
References
- Solving a facility location problem as an MIQCP, Yet Another Math Programming Consultant: Solving a facility location problem as an MIQCP
- \(k\)-means clustering, https://en.wikipedia.org/wiki/K-means_clustering
- \(k\)-medoids, https://en.wikipedia.org/wiki/K-medoids