Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 809

Maximizing Standard Deviation

$
0
0
In [1] a question was posed: how to maximize the standard deviation of a vector \(\color{darkred}x_i\), subject to side constraints using an LP/MIP formulation?

The standard deviation is usually defined as: \[\mathbf{sd}(x) = \sqrt{\frac{1}{n-1}\sum_i \left(x_i-\mu\right)^2}\] where \[\mu = \frac{\sum_i x_i}{n}\] and \(n\) is the number of elements in \(x\).

The first thing to note is that we can simplify this a bit. The solution will not change, if we replace \[\min/\max\> \mathbf{sd}(x)\] by \[\min/\max \>\sum_i \left(x_i-\mu\right)^2\]

To understand a bit what a solution with a high standard deviation looks like, let's have a look at the following model: \[\begin{align}\max & \sum_i \left(\color{darkred}x_i-\color{darkred}\mu\right)^2 \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \color{darkred}x_i \ge \color{darkred}x_{i-1}\\ & \color{darkred}x_i \in [1,4]\end{align}\] The solution is:

----     27 VARIABLE x.L  

i1 1.000, i2 1.000, i3 1.000, i4 1.000, i5 4.000, i6 4.000, i7 4.000, i8 4.000


---- 27 VARIABLE z.L = 18.000objective
PARAMETER stdev = 1.604standard deviation


We see two clusters: one at 1.0 and one at 4.0. Obviously, if your goal is to evenly distribute values, maximizing the standard deviation is not the way to go.

Note that this is a non-convex quadratic programming problem. You need a global solver for this problem. Possible solvers are Cplex, Gurobi, and general-purpose MINLP solvers like Baron and Antigone.


Model 1: Nonconvex QP


The first formulation for the problem stated in [1] is:

Nonconvex QP Model
\[\begin{align}\max & \sum_i (\color{darkred}x_i-\color{darkred}\mu)^2 \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \sum_i \color{darkblue}a_i\cdot\color{darkred}x_i = \color{darkblue}b \\ & \color{darkred}x_i \in [\color{darkblue}\ell,\color{darkblue}u] \end{align}\]

Note that we have a single linear side constraint in this model.

Model 2: SOS1 variables


Here we try to replace the quadratic terms with absolute values. This will allow us to use a linear MIP solver. We use a simple variable splitting approach: \[s_i^+ - s_i^- = \color{darkred}x_i-\color{darkred}\mu\] where \(s_i^+, s_i^- \ge 0\). As we are maximizing the absolute values we need to make sure not both \(s_i^+, s_i^-\) are positive, leading to an unbounded problem. One way to enforce this is to form \(n\) SOS1 sets with two members \(s_i^+, s_i^-\). In a SOS1 set, only one member can be nonzero. So we have:

SOS1 Approximation Model
\[\begin{align}\max & \sum_i \left(\color{darkred}s_i^+ + \color{darkred}s_i^- \right) \\ & \color{darkred}s_i^+ - \color{darkred}s_i^- = \color{darkred}x_i-\color{darkred}\mu \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \sum_i \color{darkblue}a_i\cdot\color{darkred}x_i = \color{darkblue}b \\ & \color{darkred}x_i \in [\color{darkblue}\ell,\color{darkblue}u] \\ & \color{darkred}s_i^+, \color{darkred}s_i^- \ge 0 \\ & \{\color{darkred}s_i^+, \color{darkred}s_i^-\} \in {SOS1} \end{align}\]


As we will see later, there are problems with this formulation.

Model 3: Binary variables


Instead of using SOS1 sets to choose a single branch, we can also use binary variables. This is quite similar to the previous model. 

Binary Approximation Model
\[\begin{align}\max & \sum_i \left(\color{darkred}s_i^+ + \color{darkred}s_i^- \right) \\ & \color{darkred}s_i^+ - \color{darkred}s_i^- = \color{darkred}x_i-\color{darkred}\mu \\ & \color{darkred}s_i^+ \le \color{darkred}\delta_i \cdot (\color{darkblue}u-\color{darkblue}\ell)\\ & \color{darkred}s_i^- \le (1-\color{darkred}\delta_i) \cdot (\color{darkblue}u-\color{darkblue}\ell) \\ & \color{darkred}\mu = \frac{\sum_i \color{darkred}x_i}{\color{darkblue}n} \\ & \sum_i \color{darkblue}a_i\cdot\color{darkred}x_i = \color{darkblue}b \\ & \color{darkred}x_i \in [\color{darkblue}\ell,\color{darkblue}u] \\ & \color{darkred}s_i^+, \color{darkred}s_i^- \ge 0 \\ & \color{darkred}\delta_i \in \{0,1\} \end{align}\]


As all the variables \(\color{darkred}x_i\) live inside \([\color{darkblue}\ell,\color{darkblue}u]\), the mean will also be inside this interval. This means \(\color{darkblue}u-\color{darkblue}\ell\) is a valid upper bound on the variable \(\color{darkred}s_i^+, \color{darkred}s_i^-\).

Results


----    134 PARAMETER results

GlobalQP SOS1 SOS1/bnd Binary

i1
3.0003.0003.0003.000
i2
0.5000.5000.5000.500
i3
0.5000.5000.5000.500
i4
3.0003.0003.0003.000
i5
3.0003.0003.0003.000
i6
3.0003.0003.0003.000
i7
3.0003.0003.0003.000
i8
0.5000.5000.5000.500
i9
3.0003.0003.0003.000
i10
0.5000.5000.5000.500
i11
0.5000.5000.5000.500
i12
0.5000.5000.5000.500
i13
0.5000.5000.5000.500
i14
0.5000.5000.5000.500
i15
3.0003.0003.0003.000
i16
0.5000.5000.5000.500
i17
3.0003.0003.0003.000
i18
3.0003.0003.0003.000
i19
0.5000.5000.5000.500
i20
0.5000.5000.5000.500
i21
2.9872.9872.9872.987
i22
3.0003.0003.0003.000
i23
3.0003.0003.0003.000
i24
3.0003.0003.0003.000
i25
0.5000.5000.5000.500
i26
0.5000.5000.5000.500
i27
3.0003.0003.0003.000
i28
0.5000.5000.5000.500
i29
0.5000.5000.5000.500
i30
0.5000.5000.5000.500
status Optimal Optimal Optimal Optimal
obj
46.63337.32037.32037.320
mu
1.6661.6661.6661.666
stdev
1.2681.2681.2681.268
time
0.12512399.8120.7500.719
nodes
613.000 -2.14747E+914163.00014307.000
iterations
10691.000 -1.28591E+869873.00084000.000


We see some interesting things.

The SOS1 model is just taking forever. Largely because many of the nodes are unbounded. A small fragment of the log looks like:
 
Elapsed time = 35.09 sec. (12722.88 ticks, tree = 1.78 MB)
53807756966 cutoff 36.6404104825810.00%
57630697166 unbounded 36.640411394958
61517187286 unbounded 36.640412102053
65337527353 cutoff 36.6404124910500.00%
69172217456 cutoff 36.6404129552370.00%
72942337603 cutoff 36.6404141466450.00%
76791877705 cutoff 36.6404149309050.00%
80627907714 unbounded 36.640415158021
84438597813 unbounded 36.640415935203
88292487941 cutoff 36.6404169581870.00%

Note that these reported gaps of \(0\%\) are wrong. There is no valid upper bound, so there is no meaningful gap to report. A zero gap would mean: we are almost done. I can assure you this is not the case. I would classify this as a Clex bug. B.t.w., Gurobi is not doing any better on this model. 

The reported iteration and node counts for the SOS1 column are negative. This is typically the symptom of an integer overflow somewhere. That is likely a GAMS bug.  

We can remedy this disaster by adding a bound on the objective. I used: \[\color{darkred}z \le \color{darkblue}n\cdot(\color{darkblue}u-\color{darkblue}\ell)\] This model corresponds to the column SOS1/bnd. This obviously gives a huge performance improvement.

The linear approximations find the same optimal solution as the QP model. This is not guaranteed of course.

Conclusions


This is a small non-convex QP model that can be approximated by replacing a 2-norm penalty with a 1-norm. This exercise reveals a big problem in the SOS1 formulation: it just performs extremely poorly unless we add an upper bound to the objective. Along the way, we observed some bugs.

It looks like the most intuitive approach (a QP model) is also the best performer.

References


Appendix: GAMS model



*------------------------------------------------------------------
* data
*------------------------------------------------------------------

set i /i1*i30/;

scalars
   lo
/0.5 /
   up
/3 /
   range
;
range = up-lo;

parameter a(i) 'coefficients of constraint';
a(i) = uniform(0,2);


*------------------------------------------------------------------
* reporting macros
*------------------------------------------------------------------

parameter results(*,*);

acronym Optimal;

* macros for reporting
$macro stdev      sqrt(sum(i,sqr(x.l(i)-mu.l))/(card(i)-1))
$macro report(m,label)  \
    results(i,label) = x.l(i); \
    results(
'status',label)= m.solvestat; \
    results(
'status',label)$(m.solvestat=1) = Optimal; \
    results(
'obj',label) = z.l; \
    results(
'mu',label) = mu.l; \
    results(
'stdev',label) = stdev; \
    results(
'time',label) = m.resusd; \
    results(
'nodes',label) = m.nodusd; \
    results(
'iterations',label) = m.iterusd; \
   
display results;


*------------------------------------------------------------------
* non-convex QP model
*------------------------------------------------------------------


positivevariable x(i);
x.lo(i) = lo;
x.up(i) = up;

variable
  z
'obj'
  mu
'mean'
;

Equations
   e    
'side constraint'
   obj  
'objective'
   mean 
'average'
;

e..
sum(i,a(i)*x(i)) =e= card(i);
mean.. mu =e=
sum(i,x(i))/card(i);

obj.. z =e=
sum(i, sqr(x(i)-mu));

model m1 /all/;
option qcp=cplex,threads=16;
m1.optfile=1;
solve m1 maximizing z using qcp;


$onecho > cplex.opt
OptimalityTarget=3
$offecho


report(m1,
'GlobalQP')


*------------------------------------------------------------------
* SOS1 based model
*------------------------------------------------------------------

set pm /'+','-'/;
sos1variable s(i,pm);

equations
   split
   obj2
;

split(i).. s(i,
'+')-s(i,'-') =e= x(i)-mu;
obj2.. z =e=
sum((i,pm),s(i,pm));

model m2 /split,obj2,e,mean/;
solve m2 maximizing z using mip;

report(m2,
'SOS1')

*------------------------------------------------------------------
* SOS1 with bound on the objective variable
*------------------------------------------------------------------

z.up = range*
card(i);
solve m2 maximizing z using mip;

report(m2,
'SOS1/bnd')

z.up =
INF;

*------------------------------------------------------------------
* model using binary variables
*------------------------------------------------------------------

positivevariable s2(i,pm);
binaryvariable b(i);

Equations
   split2a
   split2b
   split2c
   obj3
   ;

split2a(i).. s2(i,
'+')-s2(i,'-') =e= x(i)-mu;
split2b(i).. s2(i,
'+') =l= b(i)*range;
split2c(i).. s2(i,
'-') =l= (1-b(i))*range;
obj3.. z =e=
sum((i,pm),s2(i,pm));

model m3 /split2a,split2b,split2c,obj3,e,mean/;
solve m3 maximizing z using mip;

report(m3,
'Binary')


Viewing all articles
Browse latest Browse all 809

Trending Articles