Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 804

Revisiting a continuous facility location problem

$
0
0
I am revisiting here a problem from [1]:

We have \(n\) demand points and their locations. How many facilities do we need to service these customers? And where do we place them? We have a restriction: there is a maximum distance between customer and facility.   

Data

We randomly generated 75 demand points inside the \([0,1]\times[0,1]\) square. The maximum allowed distance between a demand point and a facility is 0.25. 

Random Demand Points


Data set
----     63 PARAMETER maxDist              =        0.250maximum distance allowed between facility
                                                           and demand point (for 1x1 map)

----     63 PARAMETER dloc  demand point locations

                   x           y

demand1        0.4970.964
demand2        0.8740.844
demand3        0.1910.495
demand4        0.2350.381
demand5        0.6010.151
demand6        0.5080.463
demand7        0.3590.153
demand8        0.3660.826
demand9        0.0240.059
demand10       0.2850.114
demand11       0.8900.374
demand12       0.2340.148
demand13       0.0190.078
demand14       0.9280.555
demand15       0.6160.360
demand16       0.5210.432
demand17       0.7500.817
demand18       0.0160.373
demand19       0.6970.767
demand20       0.8870.812
demand21       0.7900.804
demand22       0.0170.393
demand23       0.9600.722
demand24       0.8150.838
demand25       0.7750.528
demand26       0.6560.392
demand27       0.4850.403
demand28       0.4350.193
demand29       0.1290.488
demand30       0.7910.686
demand31       0.3036.351700E-5
demand32       0.4460.815
demand33       0.2080.336
demand34       0.2820.156
demand35       0.6200.860
demand36       0.6150.386
demand37       0.7630.732
demand38       0.0800.083
demand39       0.0190.245
demand40       0.3340.304
demand41       0.8350.386
demand42       0.0530.506
demand43       0.1920.234
demand44       0.9350.786
demand45       0.9430.172
demand46       0.1820.203
demand47       0.5320.506
demand48       0.2110.638
demand49       0.2960.553
demand50       0.0340.808
demand51       0.3550.989
demand52       0.8140.715
demand53       0.2430.542
demand54       0.5020.111
demand55       0.1690.287
demand56       0.8420.806
demand57       0.5630.157
demand58       0.2720.034
demand59       0.7180.705
demand60       0.1580.752
demand61       0.9550.785
demand62       0.2570.088
demand63       0.8000.219
demand64       0.9560.803
demand65       0.4600.202
demand66       0.8960.524
demand67       0.8720.906
demand68       0.3520.323
demand69       0.9250.634
demand70       0.4140.269
demand71       0.2820.741
demand72       0.6570.456
demand73       0.3810.496
demand74       0.9000.532
demand75       0.0600.396
min            0.0166.351700E-5
max            0.9600.989


Model 1: minimize the number of facilities


In the first model, we try to determine how many facilities we need. This is our "sizing" model. We use this in the second model, where we try to find an optimal location for these facilities. (The size of the second model is heavily determined by the number of facilities).

A high-level model looks like:

Model 1: find number of facilities needed
   \[\begin{align} \min & \sum_j \color{darkred}{\mathit{isOpen}}_j \\ & \color{darkred}{\mathit{assign}}_{i,j}=1 \implies \sqrt{\sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{j,c}\right)^2} \le \color{darkblue}{\mathit{maxDist}}  && \forall i,j \\ & \sum_j \color{darkred}{\mathit{assign}}_{i,j} = 1 && \forall i \\ &  \color{darkred}{\mathit{isOpen}}_j =0 \implies \color{darkred}{\mathit{assign}}_{i,j} =0 && \forall i,j \\ & \color{darkred}{\mathit{isOpen}}_j\in \{0,1\} \\ & \color{darkred}{\mathit{assign}}_{i,j} \in \{0,1\} \end{align}\]


Here \(i\) is the set of demand points, \(j\) is a set of potential facilities, and \(c=\{x,y\}\). The assignment constraint makes sure that a customer is assigned to exactly one facility. We must reformulate things a bit if we want to solve this with a standard convex MIQCP (Mixed-Integer Quadratically Constrained Program) solver. The reason is that solver developers have forgotten to implement indicator constraints (implications) for quadratic constraints.

Model 1: implementation
\[\begin{align} \min & \sum_j \color{darkred}{\mathit{isOpen}}_j \\ & \sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{j,c}\right)^2 \le \color{darkblue}{\mathit{maxDist}}^2 + \color{darkblue}M(1-\color{darkred}{\mathit{assign}}_{i,j}) && \forall i,j \\ & \sum_j \color{darkred}{\mathit{assign}}_{i,j} = 1 && \forall i \\ & \color{darkred}{\mathit{assign}}_{i,j} \le \color{darkred}{\mathit{isOpen}}_j && \forall i,j \\ & \color{darkred}{\mathit{isOpen}}_j\in \{0,1\} \\ & \color{darkred}{\mathit{assign}}_{i,j} \in \{0,1\} \end{align}\]

The reformulation is to rewrite our quadratic implication as a big-M constraint. Here, the constant \(\color{darkblue}M\) is large enough to make the constraint non-binding when \(\color{darkred}{\mathit{assign}}_{i,j}=0\). It is an interesting exercise to see how small we can make \(\color{darkblue}M\). We also converted the second, linear implication. That one is rather trivial.

The variable \(\color{darkred}{\mathit{floc}}_{j,c}\) (the location of facility \(j\)), is a free variable. We can restrict this a bit: \(\color{darkred}{\mathit{floc}}_{j,c} \in [0,1]\) where \(c \in \{x,y\}\). Or, even more precisely, we can do: \[\min_i \color{darkblue}{\mathit{dloc}}_{i,c} \le \color{darkred}{\mathit{floc}}_{j,c} \le \max_i \color{darkblue}{\mathit{dloc}}_{i,c} \] 

An optional constraint is \[\color{darkred}{\mathit{isOpen}}_j \ge \color{darkred}{\mathit{isOpen}}_{j+1}\] This gives nicer solutions: the open facilities are the first ones. More importantly, it is a symmetry breaker and can help performance. Does it help? An experiment showed the following performance:

versionnodestime (secs)
without ordering constraint909037747
with ordering constraint149479

This is quite a difference. The results for Model 1 with the ordering constraint are:





So, we can cover each customer within the allowed distance, using just 5 facilities. The model shows a feasible configuration for this. The results (location of facilities and assignment of customers to facilities) are not optimal in the sense that they yield the smallest distances possible between customers and facilities. We see this clearly in the purple and orange clusters. The only real thing this model delivers is: we need 5 facilities to meet the distance constraints.

With the ordering constraint, the resulting set of open facilities is:

----    113 VARIABLE nOpen.L               =        5.000number of open facilities

----    113 SET jpossible facilities

facility1 ,    facility2 ,    facility3 ,    facility4 ,    facility5 ,    facility6 ,    facility7 ,    facility8 ,    facility9 
facility10


----    113 SET fopen facilities

facility1,    facility2,    facility3,    facility4,    facility5

If we remove the ordering constraint, we may see solutions like:
 
----    113 SET fopen facilities

facility1 ,    facility5 ,    facility6 ,    facility9 ,    facility10


Model 2: minimize the sum of squared distances


After obtaining the optimal number of facilities in Model 1, we can try to find an optimal placement of these facilities, and at the same time, assign customers to their closest facility. 

Model 2: optimal placement of facilities
\[\begin{align} \min & \sum_{i,f|\color{darkred}{\mathit{assign}}_{i,f}=1} \sqrt{\sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{f,c}\right)^2} \\&\color{darkred}{\mathit{assign}}_{i,f}=1 \implies \sqrt{\sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{f,c}\right)^2} \le\color{darkblue}{\mathit{maxDist}} \\ & \sum_f \color{darkred}{\mathit{assign}}_{i,f} = 1 && \forall i \\ &  \color{darkred}{\mathit{assign}}_{i,f} \in \{0,1\} \end{align}\]

Here \(i\) are the demand points, and \(f\) are the (needed) facilities. In the previous model, we used \(j\) as the set of potential facilities. Here we use \(f\), which is a subset of \(j\). In the previous model, we used 10 potential facilities (so \(\color{darkred}{\mathit{assign}}_{i,j}\) was \(75 \times 10\)). In this model we have \(\color{darkred}{\mathit{assign}}_{i,f}\) which is \(75 \times 5\).

In this second model, we would like to minimize the sum of the distances between customers and facilities. It is much more convenient to change this slightly to minimize the sum of squared distances, i.e., we drop the square root in the distance calculation. We did the same thing in the previous model. There, the results are identical. Here, we really have a somewhat different model that will deliver, in general, slightly different solutions. 

Here is how we can implement the model, using our new objective:

Model 2: implementation
\[\begin{align} \min & \sum_{i,f} \color{darkred}d_{i,f} \\ & \color{darkred}d_{i,f} \ge \sum_c\left(\color{darkblue}{\mathit{dloc}}_{i,c} - \color{darkred}{\mathit{floc}}_{f,c}\right)^2 - \color{darkblue}M(1-\color{darkred}{\mathit{assign}}_{i,f}) && \forall i,f \\ & \sum_f \color{darkred}{\mathit{assign}}_{i,f} = 1 && \forall i \\ & \color{darkred}d_{i,f}\in [0,\color{darkblue}{\mathit{maxDist}}^2] \\ & \color{darkred}{\mathit{assign}}_{i,f} \in \{0,1\} \end{align}\]


We can add the following symmetry breaker: \[\color{darkred}{\mathit{floc}}_{f,x} \le \color{darkred}{\mathit{floc}}_{f+1,x}\] i.e., order the facilities by their \(x\)-coordinate. Does this help? A test run shows:

versionnodestime (secs)
without ordering constraint9447913
with ordering constraint69502.2

Again, nothing to sneeze at. The results for our Model 2 (with ordering constraint) are:




Indeed, we see a better assignment. This is evidenced by a decrease in the sum of the squared distances between customers and facilities from 2.56 to 2.18.

When the ordering constraint is turned on, we see that the facilities are nicely ordered by their \(x\)-coordinate:

----    181 PARAMETER floc2  model2 results

                    x           y

facility1       0.1850.209
facility2       0.2820.839
facility3       0.2990.458
facility4       0.7060.248
facility5       0.8370.736

Notes:
  • We have separated here the problem of finding the number of facilities (Model 1) and the optimal placement of them (Model 2). We can look at the combined problem as a multi-objective model. The simplest approach would be a weighted-objective implementation, with a relatively large weight assigned to the number of facilities. The advantage of this approach is that we only need to solve one model. However, this model is larger than Model 2, as we need to overestimate the number of facilities required (just as in Model 1).
  • The pictures prompt us to consider \(k\)-means clustering [2]. A fast and straightforward heuristic is Lloyd's algorithm. This problem is also about minimizing the sum of the squared distances. The problem is that I don't know how to incorporate the maximum distance constraint. We could think: well, by minimizing the sum of squares, we'll see an automatic reduction of the largest distance. If we run R's kmeanson our data as is, we see:
      clusters: 5 tot.withinss: 1.949382 maxdist: 0.3338258
      clusters: 6 tot.withinss: 1.528166 maxdist: 0.2735675
      clusters: 7 tot.withinss: 1.268441 maxdist: 0.2735675
      clusters: 8 tot.withinss: 1.057716 maxdist: 0.2498851

    This would allocate 8 clusters before the maximum distance is below 0.25. The resemblance between our problem and \(k\)-means is misleading. 

Model 3: minimize the sum of distances


In the previous section, we solved the "min sum of squared distances" problem. Of course, we can change just the objective function in model 2 to \[\min \sum_{i,f} \sqrt{\color{darkred}d_{i,f}}\] However, to find global solutions, we would need a global MINLP solver. There is a way to shoehorn this model into a convex solver, using a second-order cone (SOCP) constraint [2]. Such a constraint is usually specified as \[y^2 \ge x^TQx\] where \(y\) is a non-negative variable, and \(Q\) is a positive semi-definite matrix. 

It is not completely trivial to convert the previous model into a SOCP framework. Here is my attempt:

Model 3: MISOCP model implementation
\[\begin{align} \min & \sum_{i,f} \color{darkred}d_{i,f} \\ & \color{darkred}d_{i,f} \ge \color{darkred}\Delta_{i,f} - \color{darkblue}M(1-\color{darkred}{\mathit{assign}}_{i,f}) && \forall i,f \\& \color{darkred}\Delta^2_{i,f} \ge \sum_c \color{darkred}\delta^2_{i,f,c} && \forall i,f \\ & \color{darkred}\delta_{i,f,c} = \color{darkblue}{\mathit{dloc}}_{i,c}-\color{darkred}{\mathit{floc}}_{f,c} && \forall i,f,c\\ & \sum_f \color{darkred}{\mathit{assign}}_{i,f} = 1 && \forall i \\& \color{darkred}{\mathit{assign}}_{i,f} \in \{0,1\} \\ & \color{darkred}d_{i,f},\color{darkred}\Delta_{i,f} \ge 0  \end{align}\]

Obviously, this model has a boatload of extra variables and equations compared to our previous models. We can add some bounds, and of course, our ordering constraint. This model seems to perform much, much worse than model 2. We are talking about orders of magnitude slower. I did not have the patience to solve the problem using the \(n=75\) data set. 

To illustrate that indeed, model 2 and model 3 yield different results, I compared the results using a much smaller \(n=25\) data set.

 
Test with \(n=25\) Sum squared distances Sum distances
model 2: min sum squared distances0.700 (min)3.864
model 3: min sum distances0.8383.660 (min)

The performance of the MISOCP model is a disappointment. For now, the conclusion is: stick to minimization of the squared distances.

Model 4: a Medoids model


An alternative to let the model place facilities anywhere on the map, we can select a (possibly large) number of candidate points. This allows us to calculate distances (or squared distances) in advance. The resulting model is a linear MIP model, which are easier (and more reliably) to solve.

A special case is when we choose the demand points as the candidate locations for the facilities. In clustering, this is called \(k\)-medoids [3]. Of course, we have the additional side constraint that no distance between demand point and facility can be more than \(\color{darkblue}{\mathit{maxDist}}\). A multi-objective version (min number of facilities, best facility location/best assignment) can look like:


Model 4: medoids model implementation
\[\begin{align} \min\> & \color{darkblue}w_1\cdot\color{darkred}{\mathit{totDist}} +\color{darkblue}w_2\cdot\color{darkred}{\mathit{numFacs}} \\ & \color{darkred}{\mathit{totDist}} =  \sum_{(i,i')\in S}\color{darkblue}{\mathit{dist}}_{i,i'} \cdot \color{darkred}{\mathit{assign}}_{i,i'} \\ & \color{darkred}{\mathit{numFacs}} = \sum_i \color{darkred}{\mathit{facSelect}}_i \\ & \sum_{i'|(i,i')\in S} \color{darkred}{\mathit{assign}}_{i,i'} =1 && \forall i \\& \color{darkred}{\mathit{assign}}_{i,i'} \le  \color{darkred}{\mathit{facSelect}}_{i'} && \forall (i,i')\in S  \\ & \color{darkred}{\mathit{assign}}_{i,i'}, \color{darkred}{\mathit{facSelect}}_i \in \{0,1\}  \end{align}\]

Here, \(S\) is the subset of allowed assignments \(i\rightarrow i'\) (i.e., with distance less than \(\color{darkblue}{\mathit{maxDist}}\)). The parameter \(\color{darkblue}{\mathit{dist}}_{i,i'}\) is the distance measure used (it can be the squared distance or the actual distance, or any other measure).

We can run this model in one swoop (with \(\color{darkblue}w_2 \gg \color{darkblue}w_1\)), or in two steps:
  1. Solve with \(\color{darkblue}w_1=0,\color{darkblue}w_2=1\)
  2. Fix \(\color{darkred}{\mathit{numFacs}}\) and solve with  \(\color{darkblue}w_1=1,\color{darkblue}w_2=0\)
The results are as follows:


We have given up quite some freedom in where to place the facilities. So, we needed 7 facilities, where we needed just 5 facilities before, to meet the maximum distance constraint. The two linear MIP models we solved here were very easy: zero nodes were needed in both cases.

There are good heuristics for the \(k\)-Medoid clustering problem (e.g. PAM: Partitioning Around Medoids). However, they don't support a max-distance restriction.


Model 5: Trade-off: Max distance vs Number of facilities


If someone would have asked me to work on a problem like this, I would not stop at just providing the precise solutions I was asked about. I would always try to take a step back, and investigate the underlying mechanisms. In this case, for instance, we have a tension between the number of facilities we can open and the maximum distance constraint. It is interesting to trace out the trade-off between the maximum distance we can support and the number of facilities we need to build. There are two ways to research this:
  1. Vary the max distance and observe the resulting number of facilities needed, or
  2. vary the number of facilities and see what max distance we can support with that.
The max distance is a real number, so varying it can be a bit costly (depending on the resolution we choose, e.g. 0.1 units). It is easier to loop over the number of facilities, as this is an integer. The model we use here is:

Model 5: maxDist vs numFacs
\[\begin{align} \min \> & \color{darkred}{\mathit{maxDist}} \\ & \color{darkred}{\mathit{assign}}_{i,i'}\cdot \color{darkblue}{\mathit{dist}}_{i,i'} \le \color{darkred}{\mathit{maxDist}}&& \forall i,i'\\ & \color{darkblue}{\mathit{numFacs}} = \sum_i \color{darkred}{\mathit{facSelect}}_i \\ & \sum_{i'} \color{darkred}{\mathit{assign}}_{i,i'} =1 && \forall i \\& \color{darkred}{\mathit{assign}}_{i,i'} \le  \color{darkred}{\mathit{facSelect}}_{i'} && \forall i,i'  \\ & \color{darkred}{\mathit{assign}}_{i,i'}, \color{darkred}{\mathit{facSelect}}_i \in \{0,1\}  \end{align}\]


We run this model for \(\color{darkblue}{\mathit{numFacs}}=1,\dots,10\). The result is:

----    344 PARAMETER m5results  

        numFacs     MaxDist        time       nodes

k1        1.0000.6220.969
k2        2.0000.5212.625
k3        3.0000.3754.407
k4        4.0000.3374.547
k5        5.0000.2914.56376.000
k6        6.0000.2604.250
k7        7.0000.2453.985
k8        8.0000.2104.015
k9        9.0000.2083.657
k10      10.0000.1904.375


Note the comment by Rob Pratt: the first constraint can be strengthened by \[\sum_{i'} \color{darkred}{\mathit{assign}}_{i,i'}\cdot \color{darkblue}{\mathit{dist}}_{i,i'} \le \color{darkred}{\mathit{maxDist}} \] When we do this, all ten sub-problems are solved with zero nodes.

A picture of the results makes the trade-off between the two objectives clearer:



Indeed, we recognize that for \(\color{darkblue}{\mathit{maxDist}}=0.25\), the Medoid model needs 7 facilities. If we only want to pay for 6 facilities, we need to allow \(\color{darkblue}{\mathit{maxDist}}=0.26\). 

Note that the lines are a bit misleading here: only the discrete points are feasible. I keep the lines as they illuminate the shape of the curve. We see that on the left, the gradient is larger (i.e. the impact of an extra facility (or removing a facility) than on the right. This is obvious (the marginal impact of a facility is more is there are fewer of them), but it is always good to make this more explicit with this picture.

I often find these types of pictures add value to the analyses. If you present this to a client, it can lead to good discussions. Clients typically know this behavior, but making it explicit is a big help. 

GAMS Model


The GAMS code has all the above models. By default, we skip model 3 (the MISOCP model) as it solves too slowly. By default, generating the HTML output is turned on.

GAMS Model
$onText

Continuous Facility Location
Model 1:
Find number of facilities needed to service all customers with
constraint on the distance between facility and customer.
Model 2:
Given the number of facilities found in model 1, find an optimal
location of these facilities (by minimizing the sum of squared
distances) and the optimal assignment of customers to
facilities.
Model 3:
We can minimize the distances using a MISOCP (Mixed-Integer
Second Order Cone program). This takes forever for n=75
so we skip this here.
Model 4:
This is a Medoid version: the candidate locations are the
demand points. This is formulated as a multi-objective
model, but we can run the objectives (min facilities,
min sum distances) separately by changing the weights.
Model 5:
Produce efficient frontier between maxDist and numFacs
based on Medoid model 4.
Reporting is done using HTML + Plotly.


$offText


optionreslim=1000;
optionmiqcp=cplex;
optionseed=12345;

* third model (MISOCP) is very expensive
* it is better to skip this
* runmodel3=0 : skip model 3
* runmodel3<>0 : run model 3
$set runmodel3 0

* enable (1) or disable (0) symmetry breakers
$set symm 1

* set to 0 if no HTML report
$set runhtml 1

*-----------------------------------------------------------------------------------------
* data
*-----------------------------------------------------------------------------------------

Sets
dummy 'for ordering of displays' /numFacs,MaxDist,time/
i 'demand points' /demand1*demand75/
j 'possible facilities' /facility1*facility10/
c 'coordinates' /x,y/
;

Parameters
dloc(*,c) 'demand point locations'
maxDist 'maximum distance allowed between facility and demand point (for 1x1 map)' /0.25/
wh 'width and height of our region' /1/
symm "0:don't use 1:use symmetry breakers" /%symm%/
;

maxDist = maxDist*wh;

dloc(i,c) = uniform(0,wh);
dloc('min',c) = smin(i,dloc(i,c));
dloc('max',c) = smax(i,dloc(i,c));
display maxDist,dloc;

*-----------------------------------------------------------------------------------------
* model 1: find minimum number of facilities needed
* we put some effort into finding small big-M values
* this is a bit of overkill
*-----------------------------------------------------------------------------------------


*
* for proper big M calculation we need to know farthest possible distances
*
sets
LU 'lower or upper' /L,U/
b(LU,LU) 'box' /L.L, L.U, U.L, U.U/
;
alias (LU,LUx,LUy);
Parameters
corners(LU,LU,c) 'corners of box for facility locations'
farthest(i) 'max possible squared distance between demand point i and possible location of facility'
;
corners('L',LUy,'x') = dloc('min','x');
corners('U',LUy,'x') = dloc('max','x');
corners(LUx,'L','y') = dloc('min','y');
corners(LUx,'U','y') = dloc('max','y');
display corners;

farthest(i) = smax(b,sum(c,sqr(dloc(i,c)-corners(b,c))));
display farthest;


*
* model 1: MIQCP
*
variables
floc(j,c) 'facility locations'
isOpen(j) 'facility is being used'
assign(i,j) 'assign customers to facility'
nOpen 'number of open facilities'
;
binary variables isOpen,assign;

equations
distance(i,j) 'squared distance equation'
assignDemand(i) 'assign customer to exactly one facility'
closed(i,j) 'do not assign customers to closed facilties'
numFacilities 'number of open facilities'
order(j) 'optional: open facilities are first ones'
;

distance(i,j)..sum(c, sqr(dloc(i,c)-floc(j,c))) =l= sqr(maxDist)*assign(i,j) + farthest(i)*(1-assign(i,j));

assignDemand(i)..sum(j, assign(i,j)) =e= 1;

closed(i,j).. assign(i,j) =l= isOpen(j);

numFacilities.. nOpen =e= sum(j, isOpen(j));

order(j+1)$symm.. isOpen(j) =g= isOpen(j+1);

* facility locations should be inside the box formed by the demand points
floc.lo(j,c) = smin(b,corners(b,c));
floc.up(j,c) = smax(b,corners(b,c));

*
* solve
*
model m1 /all/;
solve m1 minimizing nOpen usingmiqcp;
abort$(m1.modelstat <> %modelStat.optimal% and m1.modelstat <> %modelStat.integerSolution%) "No solution";


*
* collect results
*
set f(j) 'open facilities';
f(j) = isOpen.l(j)>0.5;
display nOpen.l,j,f;

parameter res1(*) 'results model 1';
res1('facilities needed (min)') = round(nOpen.l);
res1('sum squared distances') = sum((i,j)$(assign.l(i,j)>0.5),sum(c, sqr(dloc(i,c)-floc.l(j,c))));
res1('max squared distance') = smax((i,j)$(assign.l(i,j)>0.5),sum(c, sqr(dloc(i,c)-floc.l(j,c))));
res1('sum distances') = sum((i,j)$(assign.l(i,j)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(j,c)))));
res1('max distance') = smax((i,j)$(assign.l(i,j)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(j,c)))));
res1('solver time') = m1.resusd;
res1('nodes') = m1.nodusd;
res1('binary variables') = m1.numdvar;
display res1;

set assign1(i,j) 'model1 results';
assign1(i,j) = assign.l(i,j)>0.5;

parameter floc1(j,c) 'model1 results';
floc1(f,c) = floc.l(f,c);


*-----------------------------------------------------------------------------------------
* model 2: find optimal assignment of customer to open facilities
* minimize sum of squared distances
*-----------------------------------------------------------------------------------------


positive variable d2(i,j) 'squared distance between customer and facility';

variable totdist2 'sum of squared distances';

equations
distance2(i,j) 'squared distance equation'
assignDemandf(i) 'assign customer to exactly one facility'
objective2 'minimize sum of squared distances'
orderx(j) 'order by x coordinate'
;

objective2.. totdist2 =e= sum((i,f),d2(i,f));

distance2(i,f).. d2(i,f) =g= sum(c, sqr(dloc(i,c)-floc(f,c))) - farthest(i)*(1-assign(i,f));

assignDemandf(i)..sum(f, assign(i,f)) =e= 1;

orderx(j+1)$(f(j) and symm).. floc(j,'x') =l= floc(j+1,'x');

d2.up(i,f) = sqr(maxDist);

model m2 /objective2,distance2,assignDemandf,orderx/;
solve m2 minimizing totdist2 usingmiqcp;
abort$(m2.modelstat <> %modelStat.optimal% and m2.modelstat <> %modelStat.integerSolution%) "No solution";

display totdist2.l, assign.l

parameter res2(*) 'results model 2';
res2('sum squared distances (min)') = sum((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res2('max squared distance') = smax((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res2('sum distances') = sum((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res2('max distance') = smax((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res2('solver time') = m2.resusd;
res2('nodes') = m2.nodusd;
res2('binary variables') = m2.numdvar;
display res2;

set assign2(i,j) 'model2 results';
assign2(i,f) = assign.l(i,f)>0.5;

parameter floc2(j,c) 'model2 results';
floc2(f,c) = floc.l(f,c);


*-----------------------------------------------------------------------------------------
* model 3: find optimal assignment of customer to open facilities
* minimize sum of distances
* MISOCP formulation
* this is too slow for larger data sets
*-----------------------------------------------------------------------------------------

$if%runmodel3%==0 $goto skipmodel3

positive variable
dall(i,j) 'distance between all customers and facilities'
d(i,j) 'distance between assigned customers and facilities or 0'
;

free variables
totdist 'sum of distances'
diff(i,j,c) 'facility - customer'
;

equations
objective 'minimize sum of distances'
socp(i,j) 'second order cone constraint'
ediff(i,j,c) 'difference coordinatewise'
calcd(i,j) 'big-M version of implication'
;

objective.. totdist =e= sum((i,f),d(i,f));
ediff(i,f,c).. diff(i,f,c) =e= dloc(i,c)-floc(f,c);
socp(i,f)..sqr(dall(i,f)) =g= sum(c,sqr(diff(i,f,c)));
calcd(i,f).. d(i,f) =g= dall(i,f) - sqrt(farthest(i))*(1-assign(i,f));

dall.up(i,f) = sqrt(farthest(i));
d.up(i,f) = sqrt(farthest(i));

model m3 /objective,ediff,socp,calcd,assignDemandf,orderx/;
m3.optfile=1;
solve m3 minimizing totdist usingmiqcp;
abort$(m3.modelstat <> %modelStat.optimal% and m3.modelstat <> %modelStat.integerSolution%) "No solution";

display totdist.l, assign.l

parameter res3(*) 'results model 3';
res3('sum squared distances') = sum((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res3('max squared distance') = smax((i,f)$(assign.l(i,f)>0.5),sum(c, sqr(dloc(i,c)-floc.l(f,c))));
res3('sum distances (min)') = sum((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res3('max distance') = smax((i,f)$(assign.l(i,f)>0.5),sqrt(sum(c, sqr(dloc(i,c)-floc.l(f,c)))));
res3('solver time') = m3.resusd;
res3('nodes') = m3.nodusd;
res3('binary variables') = m3.numdvar;
display res3;

set assign3(i,j) 'model3 results';
assign3(i,f) = assign.l(i,f)>0.5;

parameter floc3(j,c) 'model3 results';
floc3(f,c) = floc.l(f,c);

$onecho> cplex.opt
mipstart 1
mipstrategy 4
$offecho

$label skipmodel3


*-----------------------------------------------------------------------------------------
* Model 4: k-mediods model
*-----------------------------------------------------------------------------------------

alias (i,ii);
parameter dist(i,ii) 'any distance measure, here the euclidean distance';
dist(i,ii) = sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c))));

scalars
w1 'obj weight: distance'
w2 'obj weight: number of facilities'
;

binary variables
facSelect(i) 'select point i as facility'
assigni(i,ii) 'assign demand point i to facility ii'
;

positive variables
totDist 'obj1: sum of distances'
numFacs 'obj2: number of facilities'
;

variable z 'objective';

Equations
objMultiple 'weighted sum objective'
objDist 'obj1: sum of distances'
objNumFacs 'obj2: number of facilities'
eAssign(i) 'each customer must be assigned to one facility'
close(i,i) 'if point i is not a facility, then we can not serve customers from there'
;

set ok(i,ii) 'allowed assignments';
ok(i,ii) = sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c)))) <= maxDist;

* bi-objective
objMultiple.. z =e= w1*totDist+w2*numFacs;
objDist.. totDist =e= sum(ok(i,ii),dist(ok)*assigni(ok));
objNumFacs.. numFacs =e= sum(i,facSelect(i));

* constraints
eAssign(i)..sum(ok(i,ii),assigni(ok)) =e= 1;
close(ok(i,ii)).. assigni(ok) =l= facSelect(ii);

model m4 /objMultiple,objDist,objNumFacs,eAssign,close/;

parameter res4(*) 'results model 4';

* we solve in two phases:
* 1. minimize number of facilities needed
* 2. fix numFacs and minimize sum of distances

w1 = 0; w2 = 1;
solve m4 minimizing z usingmip;
display numfacs.l;
res4('solver time (min numFacs)') = m4.resusd;
res4('nodes (min numFacs)') = m4.nodusd;
res4('binary variables (min numFacs)') = m4.numdvar;
res4('number of facilities') = round(numFacs.l);

numfacs.fx = round(numfacs.l);
w1 = 1; w2 = 0;
solve m4 minimizing z usingmip;
res4('solver time (min totDist)') = m4.resusd;
res4('nodes (min totDist)') = m4.nodusd;
res4('binary variables (min totDist)') = m4.numdvar;


res4('sum squared distances') = sum(ok(i,ii)$(assigni.l(i,ii)>0.5),sum(c, sqr(dloc(i,c)-dloc(ii,c))));
res4('max squared distance') = smax(ok(i,ii)$(assigni.l(i,ii)>0.5),sum(c, sqr(dloc(i,c)-dloc(ii,c))));
res4('sum distances (min)') = sum(ok(i,ii)$(assigni.l(i,ii)>0.5),sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c)))));
res4('max distance') = smax(ok(i,ii)$(assigni.l(i,ii)>0.5),sqrt(sum(c, sqr(dloc(i,c)-dloc(ii,c)))));
display res4;

set assign4(i,i) 'model4 results';
assign4(ok) = assigni.l(ok)>0.5;

set facSelected(i) 'facilities selected';
facSelected(i) = facSelect.l(i) > 0.5;
display facSelected;

parameter ordFac(i) 'numbering for coloring';
ordFac(FacSelected) = facSelected.pos;
display ordFac;

*-----------------------------------------------------------------------------------------
* Model 5: trace trade-off between max distance and number of facilities
*-----------------------------------------------------------------------------------------

variable zmaxdist 'objective: minimize maxdist';

ok(i,ii) = yes;

equations
maxDistance(i,ii) 'assign(i,ii)=1 ==> dist(i,ii) <= maxdist'
;
scalar fxNumFacs 'fixed number of facilities';

maxDistance(i,ii).. assigni(i,ii)*dist(i,ii) =l= zmaxdist;

zmaxDist.lo = 0;

model m5 /maxDistance,objNumFacs,eAssign,close/;

set k /k1*k10/;
parameter m5results(k,*);
loop (k,
numFacs.fx = ord(k);
solve m5 minimizing zmaxdist usingmip;
abort$(m5.modelstat <> %modelStat.optimal% and m5.modelstat <> %modelStat.integerSolution%) "No solution";
m5results(k,'numFacs') = ord(k);
m5results(k,'maxDist') = zmaxdist.l;
m5results(k,'time') = m5.resusd;
m5results(k,'nodes') = m5.nodusd;
);
display m5results;


*-----------------------------------------------------------------------------------------
* reporting and visualization (models 1, 2 and 4)
*-----------------------------------------------------------------------------------------

$if%runhtml%==0 $stop

$set htmlfile report.html
$set datafile data.js

$macro tablerow(txt,num) '<tr><td>txt</td><td align="right"><pre>',num,'</pre></td></tr>'/;
$macro tableheaderrow(txt1,txt2) '<tr><th>txt1</th><th>txt2</th></tr>'/;
$macro tablerow2(num1,num2) '<tr><td align="right"><pre>',num1,'</td><td align="right"><pre>',num2,'</pre></td></tr>'/;

file fdata /%datafile%/;
put fdata;

* demand points
put"datatable=`"/;
put'<table>'/;
put tablerow(Demand points,card(i):0:0)
put tablerow(Max distance customer → facility,maxDist:7:3)
put'</table>'/;
put"`;"/;
put"points=["/;
loop(i,
put"{x:",dloc(i,'x'):6:4,",y:",dloc(i,'y'):6:4,"},"/;
);
put"];"/;

* model 1
put"m1table=`"/;
put'<table>'/;
put tablerow(Number facilities needed (min),res1('facilities needed (min)'):0:0)
put tablerow(Sum distances,res1('sum distances'):8:3)
put tablerow(Sum squared distances,res1('sum squared distances'):8:3)
put tablerow(Max distance,res1('max distance'):8:3)
put tablerow(Binary variables,res1('binary variables'):0:0);
put tablerow(Solver time,res1('solver time'):8:3)
put tablerow(Nodes,res1('nodes'):0:0)
put'</table>'/;
put"`;"/;
put"floc1=["/;
loop(f,
put"{x:",floc1(f,'x'):8:4,",y:",floc1(f,'y'):8:4,"},"/;
);
put"];"/;
put"assign1=["/;
loop(assign1(i,f),
put"{i:",ord(i):0:0,",f:",f.pos:0:0,"},"/;
);
put"];"/;

* model 2
put"m2table=`"/;
put'<table>'/;
put tablerow(Sum squared distances (min),res2('sum squared distances (min)'):8:3)
put tablerow(Sum distances,res2('sum distances'):8:3)
put tablerow(Max distance,res2('max distance'):8:3)
put tablerow(Binary variables,res2('binary variables'):0:0)
put tablerow(Solver time,res2('solver time'):8:3)
put tablerow(Nodes,res2('nodes'):0:0)
put'</table>'/;
put"`;"/;
put"floc2=["/;
loop(f,
put"{x:",floc2(f,'x'):6:4,",y:",floc2(f,'y'):6:4,"},"/;
);
put"];"/;
put"assign2=["/;
loop(assign2(i,f),
put"{i:",ord(i):0:0,",f:",f.pos:0:0,"},"/;
);
put"];"/;

* model 4
put"m4table=`"/;
put'<table>'/;
put tablerow(Number of facilities,res4('number of facilities'):0:0)
put tablerow(Sum squared distances,res4('sum squared distances'):8:3)
put tablerow(Sum distances (min),res4('sum distances (min)'):8:3)
put tablerow(Max distance,res4('max distance'):8:3)
put tablerow(Binary variables (min numFacs),res4('binary variables (min numFacs)'):0:0)
put tablerow(Solver time (min numFacs),res4('solver time (min numFacs)'):8:3)
put tablerow(Nodes (min numFacs),res4('nodes (min numFacs)'):0:0)
put tablerow(Binary variables (min totDist),res4('binary variables (min totDist)'):0:0)
put tablerow(Solver time (min totDist),res4('solver time (min totDist)'):8:3)
put tablerow(Nodes (min totDist),res4('nodes (min totDist)'):0:0)
put'</table>'/;
put"`;"/;
put"assign4=["/;
loop(assign4(i,ii),
put"{i:",ord(i):0:0,",ii:",ord(ii):0:0,",cl:",ordFac(ii):0:0,"},"/;
);
put"];"/;


* model 5
put"frontier=["/;
loop(k,
put"{maxdist:",m5results(k,'maxDist'):8:3,",numfacs:",m5results(k,'numFacs'):8:3,"},"/;
);
put"];"/;
put"m5table=`"/;
put'<table>'/;
put tableheaderrow(numFacs,maxDist)
loop(k,
put tablerow2(m5results(k,'numFacs'):8:3,m5results(k,'maxDist'):8:3)
);
put'</table>'/;
put"`;"/;
putclose;

$onecho> %htmlfile%
<html>
<script src="https://cdn.plot.ly/plotly-3.0.1.min.js" charset="utf-8"></script>
<script src="%datafile%" charset="utf-8"></script>
<style>
table,th, td {
border: 1px solid black;
border-collapse: collapse;
padding-left: 10px;
padding-right: 10px;
}
p { max-width:800px; }
</style>
<body>
<h1>Facility Location Model</h1>
<h2>Data: demand points</h2>
<p>The location of the demand points are randomly generated and drawn from the uniform distribution.</p>
<div id="dataTable"></div>
<div id="myPlot1" style="width:100%;max-width:700px;height:700px"></div>

<p>

<h2>Model 1: results</h2>

<p>Model 1 finds the minimum number of facilities needed to serve all customers and
obey the maximum distance constraint. To make the model quadratic, the constraint is
formulated as a maximum quadratic distance constraint. It is noted that the solution
is <b>not</b> an assignment with shortest distances between customers and facilities.</p>

<div id="m1Table"></div>
<div id="myPlot2" style="width:100%;max-width:700px;height:700px"></div>

<h2>Model 2: results</h2>

<p>Model 2 find the best location of the facilities and the optimal assignment of
customers to the facilities. It uses the number of facilities found in model 1.</p>

<div id="m2Table"></div>
<div id="myPlot3" style="width:100%;max-width:700px;height:700px"></div>

<h2>Model 4: results</h2>

<p>Model 4 uses the demand points as candidate locations for the facilities. This
is an easy MIP. It is solved here in two stages: first find the optimal number of
facilities and then find the best locations.</p>

<div id="m4Table"></div>
<div id="myPlot4" style="width:100%;max-width:700px;height:700px"></div>

<h2>Model 5: results</h2>

<p>Model 5 is tracing the trade-off between <span style="font-family: courier;">maxdist</span> (the maximum distance
limit) and <span style="font-family: courier;">numfacs</span> (the number of facilities). The results are based on
Medoid based model: the candidate locations of the facilities is the set of demand
points.</p>

<div id="m5Table"></div>
<div id="myPlot5" style="width:100%;max-width:700px;height:700px"></div>

<script>

colors = [
'#1f77b4', // muted blue
'#ff7f0e', // safety orange
'#2ca02c', // cooked asparagus green
'#d62728', // brick red
'#9467bd', // muted purple
'#8c564b', // chestnut brown
'#e377c2', // raspberry yogurt pink
'#7f7f7f', // middle gray
'#bcbd22', // curry yellow-green
'#17becf' // blue-teal
];

document.getElementById('dataTable').innerHTML = datatable;
document.getElementById('m1Table').innerHTML = m1table;
document.getElementById('m2Table').innerHTML = m2table;
document.getElementById('m4Table').innerHTML = m4table;
document.getElementById('m5Table').innerHTML = m5table;

// extract coordinates as arrays
xpoints = points.map(({x})=>x);
ypoints = points.map(({y})=>y);
xfloc1 = floc1.map(({x})=>x);
yfloc1 = floc1.map(({y})=>y);
xfloc2 = floc2.map(({x})=>x);
yfloc2 = floc2.map(({y})=>y);
numfacs = frontier.map(({numfacs})=>numfacs);
maxdist = frontier.map(({maxdist})=>maxdist);


trace1 = {
x: xpoints,
y: ypoints,
mode: 'markers',
type: 'scatter',
marker: { color: 'black' }
};

trace2 = {
x: xfloc1,
y: yfloc1,
mode: 'markers',
type: 'scatter',
marker: { color: colors },
};

trace3 = {
x: xfloc2,
y: yfloc2,
mode: 'markers',
type: 'scatter',
marker: { color: colors },
};

trace4 = {
x: numfacs,
y: maxdist,
};


assignments = [];
for (k=0; k < assign1.length; ++k) {
item = assign1[k];
i = item['i']-1;
f = item['f']-1;
asg = {
type:'line',
x0:xpoints[i],
y0:ypoints[i],
x1:xfloc1[f],
y1:yfloc1[f],
line: { color:colors[f],width:2 }
}
assignments.push(asg);
}
var layout2 = {showlegend: false, shapes:assignments};

assignments2 = [];
for (k=0; k < assign2.length; ++k) {
item = assign2[k];
i = item['i']-1;
f = item['f']-1;
ff = f % colors.length;
asg = {
type:'line',
x0:xpoints[i],
y0:ypoints[i],
x1:xfloc2[f],
y1:yfloc2[f],
line: { color:colors[ff],width:2 }
}
assignments2.push(asg);
}
var layout3 = {showlegend: false, shapes:assignments2};


assignments4 = [];
for (k=0; k < assign4.length; ++k) {
item = assign4[k];
i = item['i']-1;
ii = item['ii']-1;
cl = item['cl']-1;
ff = cl % colors.length;
asg = {
type:'line',
x0:xpoints[i],
y0:ypoints[i],
x1:xpoints[ii],
y1:ypoints[ii],
line: { color:colors[ff],width:2 }
}
assignments4.push(asg);
}
var layout4 = {showlegend: false, shapes:assignments4};


var layout5 = {
xaxis : {title:{text:'number of facilities'}},
yaxis : {title:{text:'max distance'}},
}

Plotly.newPlot('myPlot1', [trace1]);
Plotly.newPlot('myPlot2', [trace1,trace2], layout2);
Plotly.newPlot('myPlot3', [trace1,trace3], layout3);
Plotly.newPlot('myPlot4', [trace1], layout4);
Plotly.newPlot('myPlot5', [trace4], layout5);



</script>
</body>
</html>
$offEcho

executetool 'win32.ShellExecute "%htmlfile%"';

Output

Conclusion


This looks like an easy problem, but alternative formulations make a difference. 

  • Symmetry-breaking constraints can really help.
  • Minimizing the sum of squared distances is much easier than minimizing the sum of Euclidean distances (the last measure, modeled with an MISOCP model, performs very poorly).
  • Easy linear MIP models can be formulated if we only restrict the placement of the facilities to a set of candidate locations. A special case is: facilities can only be placed at demand points.
  • Our problems superficially resemble clustering problems, but clustering algorithms typically don't handle our max distance side constraint. 
  • Reporting (including visualization) is very important: both for the model developer (e.g. for model debugging) and the client. For a client, the solution report is the GUI of the model: it is the only visible part. For a model developer, it is often an afterthought. That is not the best way to think about it. 


References



Viewing all articles
Browse latest Browse all 804

Trending Articles