Lots of statistical procedures are based on an underlying optimization problem. Least squares regression and maximum likelihood estimation are two obvious examples. In a few cases, linear programming is used. Some examples are:
- Least absolute deviation (LAD) regression [1]
- Chebyshev regression [2]
- Quantile regression [3]
Dantzig Selector |
---|
\[\begin{align}\min \>& ||\color{darkred}\beta||_1 \\ \text{subject to } & ||\color{darkblue}X^T(\color{darkblue}y-\color{darkblue}X\color{darkred}\beta)||_\infty \le \color{darkblue}\delta \end{align}\] |
Here, \(||v||_1 = \sum_i |v_i| \) and \(||v||_\infty = \max_i |v_i|\). The resulting model is an LP.
Note that if \(\color{darkblue}\delta=0\) we solve a standard OLS (Ordinary Least Squares) problem. The constraint becomes: \(\color{darkblue}X^T(\color{darkblue}y-\color{darkblue}X\color{darkred}\beta)=0\) which corresponds to the so-called normal equations: \(\color{darkblue}X^T \color{darkblue}X\color{darkred}\beta=\color{darkblue}X^T \color{darkblue}y\). This system of linear equations, when solved for \(\color{darkred}\beta\), gives us a least squares solution.
The intuition is here that \(\color{darkblue}\delta\) models the trade-off between a full OLS estimate and a sparser solution.
Note that the more popular Lasso approach creates an unconstrained model using a penalty term:
Lasso |
---|
\[\begin{align}\min\>&||\color{darkblue}y-\color{darkblue}X\color{darkred}\beta||_2^2 + \color{darkblue}\lambda ||\color{darkred}\beta||_1 \end{align}\] |
To understand what the function dantzig.delta in the R package GDSARM is doing, I tried to replicate their example:
> library(GDSARM)> data(dataHamadaWu)> print(dataHamadaWu) V1 V2 V3 V4 V5 V6 V7 V8 111-1111-16.05821-1111-1-14.7333-1111-1-1-14.6254111-1-1-115.899511-1-1-11-17.00061-1-1-11-115.7527-1-1-11-1115.6828-1-11-111-16.6079-11-111-115.818101-111-1115.91711-111-11115.86312-1-1-1-1-1-1-14.809> X = dataHamadaWu[,-8]> Y = dataHamadaWu[,8]>#scale and center X and y> scaleX = base::scale(X, center=TRUE, scale =TRUE)> scaleY = base::scale(Y, center=TRUE, scale =FALSE)>> maxDelta =max(abs(t(scaleX)%*%matrix(scaleY, ncol=1)))># Dantzig Selector on 4 equally spaced delta values between 0 and maxDelta> dantzig.delta(scaleX, scaleY, delta = seq(0,maxDelta,length.out=4)) V1 V2 V3 V4 V5 V6 V7 00.170160910.1534495-0.1283823-0.26955930.078247910.47793020.095655671.752410749563350.010850840.00000000.0000000-0.11024920.000000000.31862010.000000003.50482149912670.000000000.00000000.00000000.00000000.000000000.15931010.000000005.257232248690050.000000000.00000000.00000000.00000000.000000000.00000000.00000000>
When we increase delta, the estimates become sparser. Note that this data set has \(n \gt p\), so not really the original target for this method.
It helps me better understand statistical techniques when I reimplement the underlying optimization problem, rather than just running a black-box function. The GAMS model (listed in the appendix) is not very complex. We have to linearize the absolute values in two places. In the objective and in the constraint. In this case, things are convex, so linearizing the absolute values is not very difficult.
The objective is \(\min_j |\color{darkred}\beta_j|\), which can be modeled as \[\begin{align}\min &\sum_j \color{darkred}b_j\\ & \color{darkred}b_j \ge -\color{darkred}\beta_j\\&\color{darkred}b_j\ge\color{darkred}\beta_j \\ & \color{darkred}b_j\ge 0\end{align}\]
The constraints \(||\color{darkred}v||_\infty \le \color{darkblue}\delta\) can be written as: \[ -\color{darkblue}\delta \le\color{darkred}v_j \le \color{darkblue}\delta\]
When I run my GAMS model , I can indeed reproduce the R results:
----158 PARAMETER results estimations delta V1 V2 V3 V4 V5 V6 V7 OLS 0.1700.153-0.128-0.2700.0780.4780.096 k1 0.1700.153-0.128-0.2700.0780.4780.096 k2 1.7520.011-0.1100.319 k3 3.5050.159 k4 5.257
References
- Linear Programming and LAD Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-lad-regression.html
- Linear Programming and Chebyshev Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-chebyshev-regression.html
- Median, quantiles and quantile regression as linear programming problems, https://yetanothermathprogrammingconsultant.blogspot.com/2021/06/median-quantiles-and-quantile.html
- Emmanuel Candes and Terence Tao, “The Dantzig Selector: Statistical Estimation When p Is Much Larger than n.” The Annals of Statistics, vol. 35, no. 6, 2007, pp. 2313–51.
- Robert Tibshirani, “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, 1996, pp. 267–88.
Appendix: GAMS Model
$onText
The Dantzig Selector for sparse Regression Compare results of R package GDSARM with an LP model.
$offText
* for ordering of results set dummy /delta, OLS/;
*------------------------------------------------ * data *------------------------------------------------ Sets i 'rows', j0 'all columns';
Table data(i<,j0<) V1 V2 V3 V4 V5 V6 V7 V8 1 1 1 -1 1 1 1 -1 6.058 2 1 -1 1 1 1 -1 -1 4.733 3 -1 1 1 1 -1 -1 -1 4.625 4 1 1 1 -1 -1 -1 1 5.899 5 1 1 -1 -1 -1 1 -1 7.000 6 1 -1 -1 -1 1 -1 1 5.752 7 -1 -1 -1 1 -1 1 1 5.682 8 -1 -1 1 -1 1 1 -1 6.607 9 -1 1 -1 1 1 -1 1 5.818 10 1 -1 1 1 -1 1 1 5.917 11 -1 1 1 -1 1 1 1 5.863 12 -1 -1 -1 -1 -1 -1 -1 4.809 ; display i,j0,data;
*------------------------------------------------ * scale and center *------------------------------------------------
set j(j0) 'independent variables: columns for X'; j(j0) = ord(j0)<card(j0);
parameters n 'number of rows' mean(j0) sd(j0) 'standard deviation' scaled_data(i,j0) ;
n = card(i); mean(j0) = sum(i,data(i,j0))/n;
* center scaled_data(i,j0) = data(i,j0)-mean(j0); * scale (only for X) sd(j) = sqrt(sum(i,sqr(scaled_data(i,j)))/(n-1)); scaled_data(i,j) = scaled_data(i,j)/sd(j); display scaled_data;
*------------------------------------------------ * extract X,y *------------------------------------------------
parameters X(i,j0) 'independent variables' y(i) 'dependent variable' ; X(i,j) = scaled_data(i,j); y(i) = scaled_data(i,'v8'); display X,y;
*------------------------------------------------ * Tuning parameter * maxdelta and delta *------------------------------------------------
scalar MaxDelta; MaxDelta = smax(j,abs(sum(i,X(i,j)*Y(i)))); display MaxDelta;
* create equally spaced values 0..MaxDelta set k /k1*k4/; parameter delta(k); delta(k) = MaxDelta*(ord(k)-1) / (card(k)-1); display delta;
*------------------------------------------------ * OLS regression *------------------------------------------------
variable beta(j0) 'estimators' z 'objective' r(i) 'residuals' ;
equations fit(i) 'linear model' qobj 'quadratic objective' ;
qobj.. z =e= sum(i, sqr(r(i))); fit(i).. r(i) =e= y(i) - sum(j,X(i,j)*beta(j));
model ols /qobj,fit/;
*------------------------------------------------ * Dantzig selector LP model * we need to set bounds for * Xr(i) >= -delta * Xr(i) <= delta * that is done in the solve loop *------------------------------------------------
variable Xr(j0) 'X^T*r' ; positive variable absb(j0) 'absolute values' ;
equations obj 'objective' bound1(j0) 'absolute value bound (obj)' bound2(j0) 'absolute value bound (obj)' fit(i) 'linear model' defXr(j0) 'evaluate X^T*r' ;
obj.. z =e= sum(j,absb(j)); bound1(j).. absb(j) =g= -beta(j); bound2(j).. absb(j) =g= beta(j); defXr(j).. Xr(j) =e= sum(i,X(i,j)*r(i));
model ds /obj,fit,bound1,bound2,defXr/;
*------------------------------------------------ * run regressions *------------------------------------------------
parameter results(*,*) 'estimations';
solve ols using qcp minimizing z; results('OLS',j) = beta.l(j);
results(k,'delta') = delta(k); loop(k, Xr.lo(j) = -delta(k); Xr.up(j) = delta(k); solve ds using lp minimizing z; results(k,j) = beta.l(j); ); display results; |