Problem statement
Consider \(n=100\) points in an 12 dimensional space. Find \(m=8\) points such that they are as close as possible.
Models
MIQP Model |
---|
\[\begin{align} \min & \sum_{i\lt j} \color{darkred}x_i \cdot \color{darkred}x_j \cdot \color{darkblue}{\mathit{dist}}_{i,j} \\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \end{align}\] |
This is a simple model. The main wrinkle is that we want to make sure that we only count each distance once. For this reason, we only consider distances with \(i \lt j\). Of course, we can exploit this also in calculating the distance matrix \( \color{darkblue}{\mathit{dist}}_{i,j}\), and make this a strictly upper-triangular matrix.
MIP Model |
---|
\[\begin{align} \min & \sum_{i\lt j} \color{darkred}y_{i,j} \cdot \color{darkblue}{\mathit{dist}}_{i,j} \\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i + \color{darkred}x_j - 1 && \forall i \lt j\\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \\ &\color{darkred}y_{i,j} \in [0,1] \end{align}\] |
The inequality implements the implication \[\color{darkred}x_i= 1 \textbf{ and } \color{darkred}x_j = 1 \Rightarrow \color{darkred}y_{i,j} = 1 \] The variables \(\color{darkred}y_{i,j}\) can be binary or can be relaxed to be continuous between 0 and 1. Finally, we can also consider a slightly different problem. Instead of minimizing the sum of the distances of the selected points, we can also minimize the maximum distance within this group of selected points. The model can look like:
Minmax Model |
---|
\[\begin{align} \min\> & \color{darkred}z \\ & \color{darkred}z \ge \color{darkred}y_{i,j} \cdot \color{darkblue}{\mathit{dist}}_{i,j} && \forall i \lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i + \color{darkred}x_j - 1 && \forall i \lt j\\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \\ &\color{darkred}y_{i,j} \in [0,1] \end{align}\] |
We again use our linearization here.
- Select the two points that are closest to each other.
- Select a new unselected point that is closest to our already selected points.
- Repeat step 2 until we have selected 8 points.
Small data set
----10PARAMETERcoordcoordinates
xy
point10.1720.843
point20.5500.301
point30.2920.224
point40.3500.856
point50.0670.500
point60.9980.579
point70.9910.762
point80.1310.640
point90.1600.250
point100.6690.435
point110.3600.351
point120.1310.150
point130.5890.831
point140.2310.666
point150.7760.304
point160.1100.502
point170.1600.872
point180.2650.286
point190.5940.723
point200.6280.464
point210.4130.118
point220.3140.047
point230.3390.182
point240.6460.561
point250.7700.298
point260.6610.756
point270.6270.284
point280.0860.103
point290.6410.545
point300.0320.792
point310.0730.176
point320.5260.750
point330.1780.034
point340.5850.621
point350.3890.359
point360.2430.246
point370.1310.933
point380.3800.783
point390.3000.125
point400.7490.069
point410.2020.005
point420.2700.500
point430.1510.174
point440.3310.317
point450.3220.964
point460.9940.370
point470.3730.772
point480.3970.913
point490.1200.735
point500.0550.576
HEURISTICMIQPMIPMINMAX
point21.000
point31.0001.0001.000
point91.000
point101.000
point111.0001.000
point121.000
point151.000
point181.0001.0001.000
point201.000
point231.0001.0001.000
point241.000
point251.000
point271.000
point291.000
point351.0001.000
point361.0001.0001.000
point391.0001.0001.000
point431.000
point441.0001.000
statusOptimalOptimalOptimal
obj3.4473.4470.210
sum4.9973.4473.4473.522
max0.2910.2500.2500.210
time33.9372.1250.562
Large data set
----112PARAMETERresults
HEURISTICMIQPMIPMINMAX
point51.0001.000
point81.0001.000
point171.0001.000
point191.0001.000
point241.000
point351.0001.000
point381.0001.000
point391.000
point421.0001.000
point431.000
point451.0001.0001.000
point511.000
point561.000
point761.000
point811.0001.000
point891.0001.000
point941.0001.0001.0001.000
point971.000
statusTimeLimitOptimalOptimal
obj26.23026.2301.147
sum30.73126.23026.23027.621
max1.5861.3571.3571.147
time1015.98449.0009.594
gap%23.379
Conclusions
References
- Given a set of points or vectors, find the set of N points that are closest to each other, https://stackoverflow.com/questions/64542442/given-a-set-of-points-or-vectors-find-the-set-of-n-points-that-are-closest-to-e
Appendix: GAMS code
- There are not that many models where I can use xor operators. Here it is used in the Greedy Heuristic where we want to consider points \(i\) and \(j\) where one of them is in the cluster and one outside.
- Macros are used to prevent repetitive code in the reporting of results.
- Acronyms are used in reporting.
$ontext |