Integer Programming and Least Median of Squares Regression

Least Median Regression (LMR) [1] is another attempt to make regression more robust, i.e. less sensitive to outliers.

\[\begin{align}\min_{\beta}\>&\operatorname*{median}_i r_i^2 \\
&y-X\beta = r\end{align}\]

From the summary of [1]:

Classical least squares regression consists of minimizing the sum of the squared residuals. Many authors have produced more robust versions of this estimator by replacing the square by something else, such as the absolute value. In this article a different approach is introduced in which the sum is replaced by the median of the squared residuals. The resulting estimator can resist the effect of nearly 50% of contamination in the data. In the special case of simple regression, it corresponds to finding the narrowest strip covering half of the observations. Generalizations are possible to multivariate location, orthogonal regression, and hypothesis testing in linear models.

For \(n\) even, the median is often defined as the average of the two middle observations. For this regression model. usually a slightly simplifying definition is used: the median is the \(h\)-th ordered residual with \(h=\lfloor n/2\rfloor\) (i.e. \(n/2\) rounded down) or \(h=\lfloor (n+1)/2\rfloor\). For technical reasons, sometimes another value is suggested [2]:

\[h=n – \left\lfloor \frac{n}{2} \right\rfloor + \left\lfloor \frac{(p+1)}{2}\right\rfloor \]

where \(p\) is the number of coefficients to estimate (number of \(\beta_j\)’s). Minimizing the median of the squared errors is the same as minimizing the median of the absolute values of the errors. This is because the ordering of squared errors is the same as the ordering of the absolute values. Hence the objective of our problem can be stated as:

\[\min\>|r|_{(h)} \]

i.e., minimize the \(h\)-th smallest absolute value. We used here the ordering \(|r|_{(1)}\le |r|_{(2)}\le \dots\le|r|_{(n)}\).

Intermezzo: Minimizing k-th smallest

Minimizing the \(k\)-th smallest value \(x_{(k)}\) of a set of variables \(x_i\), with \(i=1,\dots n\) is not obvious. Minimizing the largest value, \(x_{(n)}\) is easy: we can do

\[\begin{align}\min\>&z\\&z\ge x_i\end{align}\]

The trick to minimize \(x_{(k)}\) is to do:

\[\begin{align}\min\>&z\\&z\ge x_i-\delta_i M\\&\sum_i \delta_i = n-k\\&\delta_i \in \{0,1\}\end{align}\]

In effect we dropped the largest \(n-k\) values from consideration (in statistical terms this is called “trimming”). To be precise: we dropped \(n-k\) values, and because we are minimizing, these will be the largest values automatically.

Now we know how to minimize the \(h\)-th smallest value, the only hurdle left is forming the absolute value. We follow here the sparse bounding formulation from [3,4].

\[\begin{align}\min\>&z\\&-z - M\delta_i \le r_i \le z + M\delta_i\\&\sum_i \delta_i = n-h\\&r_i = y-\sum_j X_{i,j} \beta_j\\&\delta_i \in \{0,1\}\end{align}\]

It is not so easy to find good values for these big-\(M\)’s (see [2] for an attempt). One way around this is to use indicator constraints:

\[\begin{align}\min\>&z\\&\delta_i=1 \Rightarrow –z \le r_i \le z\\&\sum_i \delta_i = h\\&r_i = y-\sum_j X_{i,j} \beta_j\\&\delta_i \in \{0,1\} \end{align}\]

Note that we have flipped the meaning of \(\delta_i\). If your solver does not allow for indicator constraints, we can also achieve the same thing by SOS1 sets:

\[\begin{align}\min\>&z\\
&-z-s_i\le r_i \le z + s_i\\
&s_i\cdot \delta_i = 0\\
&\sum_i \delta_i = h\\
&r_i = y-\sum_j X_{i,j} \beta_j\\
&\delta_i \in \{0,1\}
\end{align}\]

where \(s_i\) are additional slack variables. They can be free or non-negative variables. The complementarity condition \(s_i\cdot \delta_i = 0\) can be implemented by forming \(n\) SOS1 sets with two members: \((\delta_i,s_i)\). A similar but more complicated approach using variable splitting has been proposed by [5].

This regression problem turns out to be interesting from a modeling perspective.

References

Peter Rousseeuw, Least Median of Squares Regression, Journal of the American Statistical Association, 79 (1984), pp. 871-880.
A. Giloni, M. Padberg, Least Trimmed Squares Regression, Least Median Squares Regression, and Mathematical Programming, Mathematical and Computer Modelling 35 (2002), pp. 1043-1060
Linear programming and LAD regression, http://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-lad-regression.html
Linear programming and Chebyshev regression, http://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-chebyshev-regression.html
Dimitris Bertsimas and Rahul Mazumder, Least Quantile Regression via Modern Optimization, The Annals of Statistics, 2014, Vol. 42, No. 6, 2494-2525.

Integer Programming and Least Median of Squares Regression

Intermezzo: Minimizing k-th smallest

References

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...