Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 804

LP in statistics: The Dantzig Selector

$
0
0

Lots of statistical procedures are based on an underlying optimization problem. Least squares regression and maximum likelihood estimation are two obvious examples. In a few cases, linear programming is used. Some examples are:

  • Least absolute deviation (LAD) regression [1]
  • Chebyshev regression [2]
  • Quantile regression [3]
Here is another regression example that uses linear programming. 

We want to estimate a sparse vector \(\color{darkred}\beta\) from the linear model \[\color{darblue}y=\color{darkblue}X\color{darkred}\beta+\color{darkred}e\] where the number of observations \(n\) (rows in \(\color{darkblue}X\)) is (much) smaller than the number of coefficients \(p\) to estimate (columns in \(\color{darkblue}X\)) [4]: \(p \gg n\). This is an alternative to the well-known Lasso method [5].

The method, for a given threshold \(\color{darkblue}\delta\) (a tuning parameter), will calculate:

 
Dantzig Selector
\[\begin{align}\min \>& ||\color{darkred}\beta||_1 \\ \text{subject to } & ||\color{darkblue}X^T(\color{darkblue}y-\color{darkblue}X\color{darkred}\beta)||_\infty \le \color{darkblue}\delta \end{align}\]

Here, \(||v||_1 = \sum_i |v_i| \) and \(||v||_\infty = \max_i |v_i|\). The resulting model is an LP.

Note that if \(\color{darkblue}\delta=0\) we solve a standard OLS (Ordinary Least Squares) problem. The constraint becomes: \(\color{darkblue}X^T(\color{darkblue}y-\color{darkblue}X\color{darkred}\beta)=0\) which corresponds to the so-called normal equations: \(\color{darkblue}X^T \color{darkblue}X\color{darkred}\beta=\color{darkblue}X^T \color{darkblue}y\). This system of linear equations, when solved for \(\color{darkred}\beta\), gives us a least squares solution.

The intuition is here that \(\color{darkblue}\delta\) models the trade-off between a full OLS estimate and a sparser solution.

Note that the more popular Lasso approach creates an unconstrained model using a penalty term:

Lasso
\[\begin{align}\min\>&||\color{darkblue}y-\color{darkblue}X\color{darkred}\beta||_2^2 + \color{darkblue}\lambda ||\color{darkred}\beta||_1 \end{align}\]

To understand what the function dantzig.delta in the R package GDSARM is doing, I tried to replicate their example:

> library(GDSARM)> data(dataHamadaWu)> print(dataHamadaWu)
   V1 V2 V3 V4 V5 V6 V7    V8
111-1111-16.05821-1111-1-14.7333-1111-1-1-14.6254111-1-1-115.899511-1-1-11-17.00061-1-1-11-115.7527-1-1-11-1115.6828-1-11-111-16.6079-11-111-115.818101-111-1115.91711-111-11115.86312-1-1-1-1-1-1-14.809> X = dataHamadaWu[,-8]> Y = dataHamadaWu[,8]>#scale and center X and y> scaleX = base::scale(X, center=TRUE, scale =TRUE)> scaleY = base::scale(Y, center=TRUE, scale =FALSE)>> maxDelta =max(abs(t(scaleX)%*%matrix(scaleY, ncol=1)))># Dantzig Selector on 4 equally spaced delta values between 0 and maxDelta> dantzig.delta(scaleX, scaleY, delta = seq(0,maxDelta,length.out=4)) 
                         V1        V2         V3         V4         V5        V6         V7
00.170160910.1534495-0.1283823-0.26955930.078247910.47793020.095655671.752410749563350.010850840.00000000.0000000-0.11024920.000000000.31862010.000000003.50482149912670.000000000.00000000.00000000.00000000.000000000.15931010.000000005.257232248690050.000000000.00000000.00000000.00000000.000000000.00000000.00000000>

When we increase delta, the estimates become sparser. Note that this data set has \(n \gt p\), so not really the original target for this method.

It helps me better understand statistical techniques when I reimplement the underlying optimization problem, rather than just running a black-box function. The GAMS model (listed in the appendix) is not very complex. We have to linearize the absolute values in two places. In the objective and in the constraint. In this case, things are convex, so linearizing the absolute values is not very difficult.

The objective is \(\min_j |\color{darkred}\beta_j|\), which can be modeled as \[\begin{align}\min &\sum_j \color{darkred}b_j\\ & \color{darkred}b_j  \ge -\color{darkred}\beta_j\\&\color{darkred}b_j\ge\color{darkred}\beta_j  \\ & \color{darkred}b_j\ge 0\end{align}\]

The constraints \(||\color{darkred}v||_\infty \le \color{darkblue}\delta\) can be written as: \[ -\color{darkblue}\delta \le\color{darkred}v_j \le \color{darkblue}\delta\]

When I run my GAMS model , I can indeed reproduce the R results:


----158 PARAMETER results  estimations

          delta          V1          V2          V3          V4          V5          V6          V7

OLS                   0.1700.153-0.128-0.2700.0780.4780.096
k1                    0.1700.153-0.128-0.2700.0780.4780.096
k2        1.7520.011-0.1100.319
k3        3.5050.159
k4        5.257


References


  1. Linear Programming and LAD Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-lad-regression.html
  2. Linear Programming and Chebyshev Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-chebyshev-regression.html
  3. Median, quantiles and quantile regression as linear programming problems, https://yetanothermathprogrammingconsultant.blogspot.com/2021/06/median-quantiles-and-quantile.html
  4. Emmanuel Candes and Terence Tao, “The Dantzig Selector: Statistical Estimation When p Is Much Larger than n.” The Annals of Statistics, vol. 35, no. 6, 2007, pp. 2313–51.
  5. Robert Tibshirani, “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, 1996, pp. 267–88.

Appendix: GAMS Model


$onText

 

  The Dantzig Selector for sparse Regression

 

  Compare results of R package GDSARM with

  an LP model.

 

$offText

 

for ordering of results

set dummy /delta, OLS/;

 

*------------------------------------------------

data

*------------------------------------------------

Sets

   i 'rows',

   j0 'all columns';

 

Table data(i<,j0<)

   V1 V2 V3 V4 V5 V6 V7    V8

1   1  1 -1  1  1  1 -1 6.058

2   1 -1  1  1  1 -1 -1 4.733

 -1  1  1  1 -1 -1 -1 4.625

4   1  1  1 -1 -1 -1  1 5.899

5   1  1 -1 -1 -1  1 -1 7.000

6   1 -1 -1 -1  1 -1  1 5.752

 -1 -1 -1  1 -1  1  1 5.682

 -1 -1  1 -1  1  1 -1 6.607

 -1  1 -1  1  1 -1  1 5.818

10  1 -1  1  1 -1  1  1 5.917

11 -1  1  1 -1  1  1  1 5.863

12 -1 -1 -1 -1 -1 -1 -1 4.809

;

display i,j0,data;

 

*------------------------------------------------

scale and center

*------------------------------------------------

 

set j(j0) 'independent variables: columns for X';

j(j0) = ord(j0)<card(j0);

 

parameters

   'number of rows'

   mean(j0)

   sd(j0) 'standard deviation'

   scaled_data(i,j0)

;

 

n = card(i);

mean(j0) = sum(i,data(i,j0))/n;

 

center

scaled_data(i,j0) = data(i,j0)-mean(j0);

scale (only for X)

sd(j) = sqrt(sum(i,sqr(scaled_data(i,j)))/(n-1));

scaled_data(i,j) = scaled_data(i,j)/sd(j);

display scaled_data;   

 

 

*------------------------------------------------

* extract X,y

*------------------------------------------------

 

parameters

   X(i,j0) 'independent variables'

   y(i)    'dependent variable'

;

X(i,j) = scaled_data(i,j);

y(i) = scaled_data(i,'v8');

display X,y;

 

 

*------------------------------------------------

* Tuning parameter

maxdelta and delta

*------------------------------------------------

 

scalar MaxDelta;

MaxDelta = smax(j,abs(sum(i,X(i,j)*Y(i))));

display MaxDelta;

 

* create equally spaced values 0..MaxDelta

set k /k1*k4/;

parameter delta(k);

delta(k) = MaxDelta*(ord(k)-1) / (card(k)-1);

display delta;

 

 

*------------------------------------------------

* OLS regression

*------------------------------------------------

 

variable

  beta(j0) 'estimators'

  z        'objective'

  r(i)     'residuals'

;

 

equations

   fit(i)     'linear model'

   qobj       'quadratic objective'

;

 

qobj..      z =e= sum(isqr(r(i)));   

fit(i)..    r(i) =e= y(i) - sum(j,X(i,j)*beta(j));

 

model ols /qobj,fit/;

 

*------------------------------------------------

* Dantzig selector LP model

we need to set bounds for

*  Xr(i) >= -delta

*  Xr(i) <= delta

that is done in the solve loop

*------------------------------------------------

 

variable

  Xr(j0)   'X^T*r'

;

positive variable

  absb(j0) 'absolute values'

;

 

equations

   obj        'objective'

   bound1(j0) 'absolute value bound (obj)'

   bound2(j0) 'absolute value bound (obj)'

   fit(i)     'linear model'

   defXr(j0)  'evaluate X^T*r'

;

 

obj.. z =e= sum(j,absb(j));

bound1(j).. absb(j) =g= -beta(j);

bound2(j).. absb(j) =g= beta(j);

defXr(j)..  Xr(j) =e= sum(i,X(i,j)*r(i));

 

model ds /obj,fit,bound1,bound2,defXr/;

 

 

*------------------------------------------------

run regressions

*------------------------------------------------

 

parameter results(*,*) 'estimations';

 

solve ols using qcp minimizing z;

results('OLS',j) = beta.l(j);

 

results(k,'delta') = delta(k);

loop(k,

   Xr.lo(j) = -delta(k);

   Xr.up(j) = delta(k);

   solve ds using lp minimizing z;

   results(k,j) = beta.l(j);

);

display results;

 

 



Viewing all articles
Browse latest Browse all 804

Trending Articles