Problem
The problem is easy to describe: given the coordinates of points \(p_i\), find the central point \(x\) that minimizes the sum of the distances from \(x\) to \(p_i\) [1].
Data
We just use some random data on the unit square \([0,1]\times[0,1]\):
The last entry labeled mean is just the average of the \(x\)- and \(y\)-coordinates.----11 PARAMETER p points (randomly generated)
x y
point1 0.1720.843
point2 0.5500.301
point3 0.2920.224
point4 0.3500.856
point5 0.0670.500
point6 0.9980.579
point7 0.9910.762
point8 0.1310.640
point9 0.1600.250
point10 0.6690.435
point11 0.3600.351
point12 0.1310.150
point13 0.5890.831
point14 0.2310.666
point15 0.7760.304
point16 0.1100.502
point17 0.1600.872
point18 0.2650.286
point19 0.5940.723
point20 0.6280.464
point21 0.4130.118
point22 0.3140.047
point23 0.3390.182
point24 0.6460.561
point25 0.7700.298
point26 0.6610.756
point27 0.6270.284
point28 0.0860.103
point29 0.6410.545
point30 0.0320.792
point31 0.0730.176
point32 0.5260.750
point33 0.1780.034
point34 0.5850.621
point35 0.3890.359
point36 0.2430.246
point37 0.1310.933
point38 0.3800.783
point39 0.3000.125
point40 0.7490.069
point41 0.2020.005
point42 0.2700.500
point43 0.1510.174
point44 0.3310.317
point45 0.3220.964
point46 0.9940.370
point47 0.3730.772
point48 0.3970.913
point49 0.1200.735
point50 0.0550.576
point51 0.0510.006
point52 0.4010.520
point53 0.6290.226
point54 0.3960.276
point55 0.1520.936
point56 0.4230.135
point57 0.3860.375
point58 0.2680.948
point59 0.1890.298
mean 0.3800.464
NLP Model
A straightforward non-linear programming model can look like:
Unconstrained NLP Model |
---|
\[\min \sum_i \sqrt{\sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 }\] |
We use \(c = \{x,y\}\), i.e. we have \(x\) and \(y\)-coordinates. Note that we use \(x\) in two different contexts: element of set \(c\), being the \(x\)-coordinate, and the decision variable \(x_c\).
We can use the mean as a very good starting point to help the NLP solver. I.e. \[x_c := \frac{\displaystyle\sum_i^n p_{i,c}}{n}\]
The picture below shows why the mean is such a good starting point:
![]() |
Optimal center point is close to mean point |
The numeric values are here:
----45 PARAMETER results x(center) vs mean
x y sumdist
mean 0.390427090.4729904218.02414861
x 0.370482980.4385757217.96891985
The sumdist column shows the objective values for these two points.
This is an easy NLP problem. Most NLP solvers just need a few iterations. With a system like GAMS or AMPL we get exact gradients automatically. That is much preferable to finite differences which seems the prevalent method people use in an R or Python environment.
Cone programming I
The above problem can also be written as a cone programming problem. This will allow us to use a different class of solvers to work on this problem. Here we use CVXPY [2] to express the model. The Python code can look like:
importcvxpyascp
x = cp.Variable(2) # center point
obj = cp.Minimize( cp.sum( [ cp.norm(x-p[i,:]) for i inrange(N) ] ) )
prob = cp.Problem(obj)
objval = prob.solve(solver=cp.SCS, verbose=True)
This is very much a straight translation of our unconstrained NLP model. Although we only declared two \(x\) variables, behind the scenes the model is blown up to a rather large one. We can see from the log:
----------------------------------------------------------------------------
SCS v2.1.1- Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 150
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 0, rho_x = 1.00e-03
Variables n = 52, constraints m = 150
Cones: soc vars: 150, soc blks: 50
The generated SOCP (second order cone programming) model is larger, but also very sparse. The solver has no problem solving it very quickly. With SOCP solvers we usually don't worry about an initial point like we used in the NLP model.
Cone programming II
If we don't use a modeling tool that can do these transformations automatically, we can use a DIY approach. Second-order cone constraints can be stated as: \[||A_i x+b_i||_2 \le c_i^Tx + d_i \>\>\forall i\] This would imply that we can write our model as:
SOCP Model Attempt |
---|
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 && \forall i \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}\end{align} \] |
Unfortunately this will yield the message: CPLEX Error 5002: 'e(point1)' is not convex. We can repair this as follows:
This now solves quickly. We can now understand that CVXPY did quite a few steps before passing the model to the solver. As argued in [3], it is much better if the modeling system takes care of these reformulations. Some of them are not immediate obvious, and hand-crafted reformulations can be error-prone.
Repaired SOCP Model |
---|
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c \color{darkred}y_{i,c}^2 && \forall i \\ & \color{darkred} y_{i,c} = \color{darkred} x_c -\color{darkblue}p_{i,c} && \forall i,c \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}, \color{darkred} y_{i,c} \text{ free}\end{align} \] |
This now solves quickly. We can now understand that CVXPY did quite a few steps before passing the model to the solver. As argued in [3], it is much better if the modeling system takes care of these reformulations. Some of them are not immediate obvious, and hand-crafted reformulations can be error-prone.
Conclusion
The min sum distance problem has a simple NLP formulation which can be improved by using a good initial point. It can also be formulated as a SOCP problem. Using high-level modeling tools this is not difficult. Without automatic reformulations things become a bit less obvious.References
- The point that minimizes the sum of euclidean distances to a set of n points, https://stackoverflow.com/questions/57277247/the-point-that-minimizes-the-sum-of-euclidean-distances-to-a-set-of-n-points
- https://www.cvxpy.org/
- Victor Zverovich, Robert Fourer, Automatic Reformulation of Second-Order Cone Programming Problems, https://ampl.com/MEETINGS/TALKS/2015_01_Richmond_2E.2.pdf