Linear Programming and LAD Regression

I believe any book on linear programming will mention LAD (Least Absolute Deviation) or \(\ell_1\) regression: minimize the sum of the absolute values of the residuals.

\[\begin{align}\min_{\beta}\>&\sum_i |r_i|\\
&y-X\beta = r\end{align}\]

Here \(\beta\) are coefficients to estimate. So they are the decision variables in the optimization model. \(X,y\) are data (watch out here: in many optimization models we denote decision variables by \(x\); here they are constants). \(r\) are the residuals; it is an auxiliary variable that can be substituted out.

There are at least three common LP formulations for this:

Variable splitting:\[\begin{align}\min\>&\sum_i r^+_i + r^-_i\\&r^+_i - r^-_i =y_i –\sum_j X_{i,j}\beta_j\\&r^+_i, r^-_i\ge 0\end{align}\]In this model, automatically one of the pair \((r^+_i,r^-_i)\) will be zero. We don’t have to add an explicit complementarity condition \(r^+_i \cdot r^-_i = 0\). This is fortunate: we can keep the model linear.
Bounding:\[\begin{align}\min\>&\sum_i r_i\\&-r_i \le y_i –\sum_j X_{i,j}\beta_j \le r_i\end{align}\]Here \(r_i\) can be left free or you can make it a non-negative variable. It will be non-negative automatically. Note that there are actually two constraints here. This formulation is mentioned in [1].
Dual:\[\begin{align}\max\>&\sum_i y_i d_i\\&\sum_i X_{i,j} d_i=0 \perp \beta_j \\&-1\le d_i \le 1\end{align}\] The optimal values for \(\beta\) can be recovered from the duals for the constraint \(X^Td=0\) (this is what the notation \(\perp\) means).

One could make the argument the last formulation is the best: it has fewer variables than variable splitting, and fewer equations than the bounding approach. In addition, the bounding formulation has all the data \(X_{i,j}\) twice in the model, leading to a large number of non-zero elements in the LP matrix.

Modern LP solvers are not very well suited for these type of models. They like very sparse LP models, while these models are very dense. Let’s try anyway with a large, artificial data set with \(m=5,000\) cases, and \(n=100\) coefficients. The data matrix \(X\) has 500,000 elements. Some timings with Cplex (default LP method) yield the following results:

The dual formulation seems indeed quite fast. It is interesting that the bounding model (this formulation is used a lot) is actually the slowest. Note that these results were obtained using default settings. This effectively means that Cplex selected the dual simplex solver for all these instances. These timings will change when the primal simplex method or the barrier algorithm is used.

References

Least absolute deviations, https://en.wikipedia.org/wiki/Least_absolute_deviations
A.Giloni, M.Padberg, Alternative Methods of Linear Regression, Mathematical and Computer Modeling 35 (2002), pp.361-374.

Linear Programming and LAD Regression

References

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List