LP in statistics: The Dantzig Selector

Lots of statistical procedures are based on an underlying optimization problem. Least squares regression and maximum likelihood estimation are two obvious examples. In a few cases, linear programming is used. Some examples are:

Least absolute deviation (LAD) regression [1]
Chebyshev regression [2]
Quantile regression [3]

Here is another regression example that uses linear programming.

We want to estimate a sparse vector $\color{darkred}\beta$ from the linear model \[\color{darblue}y=\color{darkblue}X\color{darkred}\beta+\color{darkred}e\] where the number of observations $n$ (rows in $\color{darkblue}X$) is (much) smaller than the number of coefficients $p$ to estimate (columns in $\color{darkblue}X$) [4]: $p \gg n$. This is an alternative to the well-known Lasso method [5].

The method, for a given threshold $\color{darkblue}\delta$ (a tuning parameter), will calculate:

Dantzig Selector
\[\begin{align}\min \>& \|\|\color{darkred}\beta\|\|_1 \\ \text{subject to } & \|\|\color{darkblue}X^T(\color{darkblue}y-\color{darkblue}X\color{darkred}\beta)\|\|_\infty \le \color{darkblue}\delta \end{align}\]

Here, $||v||_1 = \sum_i |v_i| $ and $||v||_\infty = \max_i |v_i|$. The resulting model is an LP.

Note that if $\color{darkblue}\delta=0$ we solve a standard OLS (Ordinary Least Squares) problem. The constraint becomes: $\color{darkblue}X^T(\color{darkblue}y-\color{darkblue}X\color{darkred}\beta)=0$ which corresponds to the so-called normal equations: $\color{darkblue}X^T \color{darkblue}X\color{darkred}\beta=\color{darkblue}X^T \color{darkblue}y$. This system of linear equations, when solved for $\color{darkred}\beta$, gives us a least squares solution.

The intuition is here that $\color{darkblue}\delta$ models the trade-off between a full OLS estimate and a sparser solution.

Note that the more popular Lasso approach creates an unconstrained model using a penalty term:

Lasso
\[\begin{align}\min\>&\|\|\color{darkblue}y-\color{darkblue}X\color{darkred}\beta\|\|_2^2 + \color{darkblue}\lambda \|\|\color{darkred}\beta\|\|_1 \end{align}\]

To understand what the function dantzig.delta in the R package GDSARM is doing, I tried to replicate their example:

> library(GDSARM)> data(dataHamadaWu)> print(dataHamadaWu)
   V1 V2 V3 V4 V5 V6 V7    V8
111-1111-16.05821-1111-1-14.7333-1111-1-1-14.6254111-1-1-115.899511-1-1-11-17.00061-1-1-11-115.7527-1-1-11-1115.6828-1-11-111-16.6079-11-111-115.818101-111-1115.91711-111-11115.86312-1-1-1-1-1-1-14.809> X = dataHamadaWu[,-8]> Y = dataHamadaWu[,8]>#scale and center X and y> scaleX = base::scale(X, center=TRUE, scale =TRUE)> scaleY = base::scale(Y, center=TRUE, scale =FALSE)>> maxDelta =max(abs(t(scaleX)%*%matrix(scaleY, ncol=1)))># Dantzig Selector on 4 equally spaced delta values between 0 and maxDelta> dantzig.delta(scaleX, scaleY, delta = seq(0,maxDelta,length.out=4)) 
                         V1        V2         V3         V4         V5        V6         V7
00.170160910.1534495-0.1283823-0.26955930.078247910.47793020.095655671.752410749563350.010850840.00000000.0000000-0.11024920.000000000.31862010.000000003.50482149912670.000000000.00000000.00000000.00000000.000000000.15931010.000000005.257232248690050.000000000.00000000.00000000.00000000.000000000.00000000.00000000>

When we increase delta, the estimates become sparser. Note that this data set has $n \gt p$, so not really the original target for this method.

It helps me better understand statistical techniques when I reimplement the underlying optimization problem, rather than just running a black-box function. The GAMS model (listed in the appendix) is not very complex. We have to linearize the absolute values in two places. In the objective and in the constraint. In this case, things are convex, so linearizing the absolute values is not very difficult.

The objective is $\min_j |\color{darkred}\beta_j|$, which can be modeled as \[\begin{align}\min &\sum_j \color{darkred}b_j\\ & \color{darkred}b_j \ge -\color{darkred}\beta_j\\&\color{darkred}b_j\ge\color{darkred}\beta_j \\ & \color{darkred}b_j\ge 0\end{align}\]

The constraints $||\color{darkred}v||_\infty \le \color{darkblue}\delta$ can be written as: \[ -\color{darkblue}\delta \le\color{darkred}v_j \le \color{darkblue}\delta\]

When I run my GAMS model , I can indeed reproduce the R results:

----158 PARAMETER results  estimations

          delta          V1          V2          V3          V4          V5          V6          V7

OLS                   0.1700.153-0.128-0.2700.0780.4780.096
k1                    0.1700.153-0.128-0.2700.0780.4780.096
k2        1.7520.011-0.1100.319
k3        3.5050.159
k4        5.257

References

Linear Programming and LAD Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-lad-regression.html
Linear Programming and Chebyshev Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-chebyshev-regression.html
Median, quantiles and quantile regression as linear programming problems, https://yetanothermathprogrammingconsultant.blogspot.com/2021/06/median-quantiles-and-quantile.html
Emmanuel Candes and Terence Tao, “The Dantzig Selector: Statistical Estimation When p Is Much Larger than n.” The Annals of Statistics, vol. 35, no. 6, 2007, pp. 2313–51.
Robert Tibshirani, “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, 1996, pp. 267–88.

Appendix: GAMS Model

$onText

The Dantzig Selector for sparse Regression

Compare results of R package GDSARM with

an LP model.

$offText

* for ordering of results

set dummy /delta, OLS/;

*------------------------------------------------

* data

*------------------------------------------------

Sets

i 'rows',

j0 'all columns';

Table data(i<,j0<)

V1 V2 V3 V4 V5 V6 V7 V8

1 1 1 -1 1 1 1 -1 6.058

2 1 -1 1 1 1 -1 -1 4.733

3 -1 1 1 1 -1 -1 -1 4.625

4 1 1 1 -1 -1 -1 1 5.899

5 1 1 -1 -1 -1 1 -1 7.000

6 1 -1 -1 -1 1 -1 1 5.752

7 -1 -1 -1 1 -1 1 1 5.682

8 -1 -1 1 -1 1 1 -1 6.607

9 -1 1 -1 1 1 -1 1 5.818

10 1 -1 1 1 -1 1 1 5.917

11 -1 1 1 -1 1 1 1 5.863

12 -1 -1 -1 -1 -1 -1 -1 4.809

;

display i,j0,data;

*------------------------------------------------

* scale and center

*------------------------------------------------

set j(j0) 'independent variables: columns for X';

j(j0) = ord(j0)<card(j0);

parameters

n 'number of rows'

mean(j0)

sd(j0) 'standard deviation'

scaled_data(i,j0)

;

n = card(i);

mean(j0) = sum(i,data(i,j0))/n;

* center

scaled_data(i,j0) = data(i,j0)-mean(j0);

* scale (only for X)

sd(j) = sqrt(sum(i,sqr(scaled_data(i,j)))/(n-1));

scaled_data(i,j) = scaled_data(i,j)/sd(j);

display scaled_data;

*------------------------------------------------

* extract X,y

*------------------------------------------------

parameters

X(i,j0) 'independent variables'

y(i) 'dependent variable'

;

X(i,j) = scaled_data(i,j);

y(i) = scaled_data(i,'v8');

display X,y;

*------------------------------------------------

* Tuning parameter

* maxdelta and delta

*------------------------------------------------

scalar MaxDelta;

MaxDelta = smax(j,abs(sum(i,X(i,j)*Y(i))));

display MaxDelta;

* create equally spaced values 0..MaxDelta

set k /k1*k4/;

parameter delta(k);

delta(k) = MaxDelta*(ord(k)-1) / (card(k)-1);

display delta;

*------------------------------------------------

* OLS regression

*------------------------------------------------

variable

beta(j0) 'estimators'

z 'objective'

r(i) 'residuals'

;

equations

fit(i) 'linear model'

qobj 'quadratic objective'

;

qobj.. z =e= sum(i, sqr(r(i)));

fit(i).. r(i) =e= y(i) - sum(j,X(i,j)*beta(j));

model ols /qobj,fit/;

*------------------------------------------------

* Dantzig selector LP model

* we need to set bounds for

* Xr(i) >= -delta

* Xr(i) <= delta

* that is done in the solve loop

*------------------------------------------------

variable

Xr(j0) 'X^T*r'

;

positive variable

absb(j0) 'absolute values'

;

equations

obj 'objective'

bound1(j0) 'absolute value bound (obj)'

bound2(j0) 'absolute value bound (obj)'

fit(i) 'linear model'

defXr(j0) 'evaluate X^T*r'

;

obj.. z =e= sum(j,absb(j));

bound1(j).. absb(j) =g= -beta(j);

bound2(j).. absb(j) =g= beta(j);

defXr(j).. Xr(j) =e= sum(i,X(i,j)*r(i));

model ds /obj,fit,bound1,bound2,defXr/;

*------------------------------------------------

* run regressions

*------------------------------------------------

parameter results(*,*) 'estimations';

solve ols using qcp minimizing z;

results('OLS',j) = beta.l(j);

results(k,'delta') = delta(k);

loop(k,

Xr.lo(j) = -delta(k);

Xr.up(j) = delta(k);

solve ds using lp minimizing z;

results(k,j) = beta.l(j);

);

display results;

LP in statistics: The Dantzig Selector

References

Appendix: GAMS Model

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List