Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 809

Integer Programming and Least Median of Squares Regression

$
0
0

Least Median Regression (LMR) [1] is another attempt to make regression more robust, i.e. less sensitive to outliers.

\[\begin{align}\min_{\beta}\>&\operatorname*{median}_i r_i^2 \\
&y-X\beta = r\end{align}\]

From the summary of [1]:

image

Classical least squares regression consists of minimizing the sum of the squared residuals. Many authors have produced more robust versions of this estimator by replacing the square by something else, such as the absolute value. In this article a different approach is introduced in which the sum is replaced by the median of the squared residuals. The resulting estimator can resist the effect of nearly 50% of contamination in the data. In the special case of simple regression, it corresponds to finding the narrowest strip covering half of the observations. Generalizations are possible to multivariate location, orthogonal regression, and hypothesis testing in linear models.

For \(n\) even, the median is often defined as the average of the two middle observations. For this regression model. usually a slightly  simplifying definition is used: the median is the \(h\)-th ordered residual with \(h=\lfloor n/2\rfloor\) (i.e. \(n/2\) rounded down) or \(h=\lfloor (n+1)/2\rfloor\). For technical reasons, sometimes another value is suggested [2]:

\[h=n – \left\lfloor \frac{n}{2} \right\rfloor + \left\lfloor \frac{(p+1)}{2}\right\rfloor \]

where \(p\) is the number of coefficients to estimate (number of \(\beta_j\)’s). Minimizing the median of the squared errors is the same as minimizing the median of the absolute values of the errors. This is because the ordering of squared errors is the same as the ordering of the absolute values. Hence the objective of our problem can be stated as:

\[\min\>|r|_{(h)} \]

i.e., minimize the \(h\)-th smallest absolute value. We used here the ordering \(|r|_{(1)}\le |r|_{(2)}\le \dots\le|r|_{(n)}\).

Intermezzo: Minimizing k-th smallest

Minimizing the \(k\)-th smallest value \(x_{(k)}\) of a set of variables \(x_i\), with \(i=1,\dots n\) is not obvious. Minimizing the largest value, \(x_{(n)}\) is easy: we can do

\[\begin{align}\min\>&z\\&z\ge x_i\end{align}\]

The trick to minimize \(x_{(k)}\) is to do:

\[\begin{align}\min\>&z\\&z\ge x_i-\delta_i M\\&\sum_i \delta_i = n-k\\&\delta_i \in \{0,1\}\end{align}\]
In effect we dropped the largest \(n-k\) values from consideration (in statistical terms this is called “trimming”).  To be precise: we dropped \(n-k\) values, and because we are minimizing, these will be the largest values automatically.

Now we know how to minimize the \(h\)-th smallest value, the only hurdle left is forming the absolute value. We follow here the sparse bounding formulation from [3,4].

\[\begin{align}\min\>&z\\&-z - M\delta_i \le r_i \le z + M\delta_i\\&\sum_i \delta_i = n-h\\&r_i = y-\sum_j X_{i,j} \beta_j\\&\delta_i \in \{0,1\}\end{align}\]
It is not so easy to find good values for these big-\(M\)’s (see [2] for an attempt). One way around this is to use indicator constraints:
\[\begin{align}\min\>&z\\&\delta_i=1 \Rightarrow  –z \le  r_i \le z\\&\sum_i \delta_i = h\\&r_i = y-\sum_j X_{i,j} \beta_j\\&\delta_i \in \{0,1\} \end{align}\]

Note that we have flipped the meaning of \(\delta_i\). If your solver does not allow for indicator constraints, we can also achieve the same thing by SOS1 sets:

\[\begin{align}\min\>&z\\
&-z-s_i\le r_i \le z + s_i\\
&s_i\cdot \delta_i = 0\\
&\sum_i \delta_i = h\\
&r_i = y-\sum_j X_{i,j} \beta_j\\
&\delta_i \in \{0,1\}
\end{align}\]

where \(s_i\) are additional slack variables. They can be free or non-negative variables. The complementarity condition \(s_i\cdot \delta_i = 0\) can be implemented by forming \(n\) SOS1 sets with two members: \((\delta_i,s_i)\). A similar but more complicated approach using variable splitting has been proposed by [5].

This regression problem turns out to be interesting from a modeling perspective. 

References
  1. Peter Rousseeuw, Least Median of Squares Regression, Journal of the American Statistical Association, 79 (1984), pp. 871-880.
  2. A. Giloni, M. Padberg, Least Trimmed Squares Regression, Least Median Squares Regression, and Mathematical Programming, Mathematical and Computer Modelling 35 (2002), pp. 1043-1060
  3. Linear programming and LAD regression, http://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-lad-regression.html
  4. Linear programming and Chebyshev regression, http://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-chebyshev-regression.html
  5. Dimitris Bertsimas and Rahul Mazumder, Least Quantile Regression via Modern Optimization, The Annals of Statistics, 2014, Vol. 42, No. 6, 2494-2525.

Viewing all articles
Browse latest Browse all 809

Trending Articles