Total Least Squares: nonconvex optimization

Total Least Squares (TLS) is an alternative for OLS (Ordinary Least Squares). It is a form of orthogonal regression and also deals with the problem of EIV (Errors-in-Variables).

The standard OLS model is \[\color{darkblue}y = \color{darkblue}X\color{darkred}\beta + \color{darkred}\varepsilon\] where we minimize the sum-of-squares of the residuals \[\min ||\color{darkred}\varepsilon||_2^2\] We can interpret $\color{darkred}\varepsilon$ as the error in $\color{darkblue}y$.

In TLS, we also allow for errors in $\color{darkblue}X$. The model becomes \[\color{darkblue}y+\color{darkred}\varepsilon=(\color{darkblue}X+\color{darkred}E)\color{darkred}\beta\] Note that we made a sign change in $\color{darkred}\varepsilon$. This is pure aesthetics: to make the equation more symmetric looking. The objective is specified as \[\min \> ||\left(\color{darkred}\varepsilon \> \color{darkred}E\right)||_F\] i.e. the Frobenius norm of the matrix formed by $\color{darkred}\varepsilon$ and $\color{darkred}E$. The Frobenius norm is just \[||A||_F=\sqrt{\sum_{i,j}a_{i,j}^2}\] We can drop the square root from the objective (the solution will remain the same, but we got rid of a non-linear function with a possible problem near zero: the gradient is not defined there). The remaining problem is a non-convex quadratic problem which can be solved with global MINLP solvers such as Baron or with a global quadratic solver like Gurobi.

One property of TLS is that orthogonal distances are measured instead of vertical ones for OLS:

Orthogonal distances for TLS (picture from [2])

Usually, linear TLS is solved using Singular Value Decomposition. But, here I want to try the following non-convex model:

Non-convex Quadratic Model
\[\begin{align}\min&\sum_i \color{darkred}\varepsilon_i^2+ \sum_{i,j}\color{darkred}E_{i,j}^2\\ & \color{darkblue}y_i + \color{darkred}\varepsilon_i = \sum_j \left(\color{darkblue}X_{i,j}+\color{darkred}E_{i,j}\right) \color{darkred}\beta_j && \forall i\end{align}\]

Example

Consider the small generated data set:

----     51 PARAMETER data  

              y          x1          x2

i1     5.6583900.4978083.435376
i2    10.2060821.5457459.413754
i3    11.7354841.6531349.476257
i4    10.9945952.8867859.694459
i5     8.1782342.35303910.770926
i6    10.5860222.7681205.760277
i7     4.4817382.0639612.004087
i8     7.4951353.5429603.440644
i9    10.9594112.1747416.331947
i10    4.8093242.8024161.620228
i11   13.2902203.40651110.506181
i12    9.7745143.5603765.563656
i13   12.0840653.4039178.248807
i14    7.9927314.4814982.519736
i15   12.1112853.9963637.249193
i16   14.4870923.9706839.410644
i17   11.0624003.7342517.364956
i18    7.5928654.7534974.695553
i19   11.2445803.4450856.214768
i20    7.8374136.7824336.367412
i21   12.1992754.1444867.570009
i22   13.8726305.45447610.463741
i23   10.4618765.0577934.034925
i24   14.7130125.6723847.914547
i25   10.7510434.1856217.648005

I tried the non-convex quadratic optimization model using GAMS with different global MINLP and QCP solvers. Most solvers found the optimal solution quite quickly but had enormous problems proving optimality.

The solution for data without constant term looks like:

----     83 VARIABLE beta.L  parameter to estimate

x1 1.282931,    x2 0.844328

When we use the R model (reproduced in appendix 2 below), we note that we need to explicitly remove the constant term from the model (that is the -1 in the formula). If we would have kept the (implicit) constant term, we would have seen:

> tls(Y ~ X1 + X2, data = df)
Error in tls(Y ~ X1 + X2, data = df) :
Using total least squares requires model without intercept.

The model with the constant term removed produces:

> tls(Y ~ X1 + X2 - 1, data = df)

$coefficient
       X1        X2 
1.28293140.8443277

$confidence.interval
2.5% lower bound 97.5% upper bound
X1        0.65088681.914976
X2        0.52489241.163763

$sd.est
       X1        X2 
0.32247760.1629802

Good to see we agree on the solution.

I still have a few questions and remarks about this.

What if we want a constant term? This is often modeled by a data column with ones. Looks to me that there is no error in this column.
In OLS we may add dummy variables to indicate a structural break. This is encoded by a data column with 0's and 1's. Like the constant term, there is no measurement error involved here.
These situations could be handled easily in the optimization model. How to handle this with a singular value decomposition-based TLS algorithm is described in [3].
This is a good benchmark model for non-convex MINLP/QCP solvers: small but very difficult.
The optimization model confirms my understanding of the underlying mathematical model. Such a model is often more precise and succinct than a verbal description.
The TLS model measures orthogonal distances. This is dependent on the scale of the variables (columns). One could preprocess the data using a normalization step.

References

Total least squares, https://en.wikipedia.org/wiki/Total_least_squares
Ivan Markovsky, Sabine Van Huffel, Overview of total least squares methods, Signal Processing, Volume 87, Issue 10, October 2007, Pages 2283-2302, https://eprints.soton.ac.uk/263855/1/tls_overview.pdf
P. de Groen, An Introduction to Total Least Squares, https://arxiv.org/pdf/math/9805076.pdf

Appendix 1: GAMS model

$ontext

   Total Least Squares as optimization problem

     Ivan Markovsky, Sabine Van Huffel, Overview of total least squares methods,
     Signal Processing, Volume 87, Issue 10, October 2007, Pages 2283-2302

     compare to R function tls (package tls)

$offtext

set
   i 'observations'/i1*i25/
   j0 'statistical variables' /y,x1*x2/
   j(j0) 'explanatory variables'/x1*x2/
;

table data(i,j0)
            Y        X1       X2
i1    5.658390 0.4978076 3.435376
i2   10.206082 1.5457447 9.413754
i3   11.735484 1.6531337 9.476257
i4   10.994595 2.8867848 9.694459
i5    8.178234 2.3530392 10.770926
i6   10.586022 2.7681198 5.760277
i7    4.481738 2.0639606 2.004087
i8    7.495135 3.5429598 3.440644
i9   10.959411 2.1747406 6.331947
i10   4.809324 2.8024155 1.620228
i11 13.290220 3.4065109 10.506181
i12   9.774514 3.5603761 5.563656
i13 12.084065 3.4039173 8.248807
i14   7.992731 4.4814979 2.519736
i15 12.111285 3.9963628 7.249193
i16 14.487092 3.9706833 9.410644
i17 11.062400 3.7342514 7.364956
i18   7.592865 4.7534969 4.695553
i19 11.244580 3.4450848 6.214768
i20   7.837413 6.7824328 6.367412
i21 12.199275 4.1444857 7.570009
i22 13.872630 5.4544764 10.463741
i23 10.461876 5.0577928 4.034925
i24 14.713012 5.6723841 7.914547
i25 10.751043 4.1856209 7.648005
;

option decimals=6;
display data;

parameter
y(i)
X(i,j)
;
y(i) = data(i,'y');
X(i,j) = data(i,j);

variables
   beta(j)      'parameter to estimate'
   epsilon(i)   'error in y'
   E(i,j)       'error in X'
   z            'objective variable'
;

equations
   obj          'objective'
   fit(i)       'equation to fit'
;

obj..    z =e= sum(i, sqr(epsilon(i))) + sum((i,j), sqr(E(i,j)));
fit(i).. y(i) + epsilon(i) =e= sum(j, (X(i,j)+E(i,j))*beta(j));

beta.lo(j) = -10; beta.up(j) = 10;
epsilon.lo(i) = -10; epsilon.up(i) = 10;
E.lo(i,j) = -10; E.up(i,j) = 10;

model tls /all/;
option optcr=0, threads=8, reslim=1000, minlp=baron;
solve tls using minlp minimizing z;

display beta.l;

Appendix 2: R code

df <- read.table(text="
id      Y        X1       X2
i1    5.658390  0.4978076  3.435376
i2   10.206082  1.5457447  9.413754
i3   11.735484  1.6531337  9.476257
i4   10.994595  2.8867848  9.694459
i5    8.178234  2.3530392 10.770926
i6   10.586022  2.7681198  5.760277
i7    4.481738  2.0639606  2.004087
i8    7.495135  3.5429598  3.440644
i9   10.959411  2.1747406  6.331947
i10   4.809324  2.8024155  1.620228
i11  13.290220  3.4065109 10.506181
i12   9.774514  3.5603761  5.563656
i13  12.084065  3.4039173  8.248807
i14   7.992731  4.4814979  2.519736
i15  12.111285  3.9963628  7.249193
i16  14.487092  3.9706833  9.410644
i17  11.062400  3.7342514  7.364956
i18   7.592865  4.7534969  4.695553
i19  11.244580  3.4450848  6.214768
i20   7.837413  6.7824328  6.367412
i21  12.199275  4.1444857  7.570009
i22  13.872630  5.4544764 10.463741
i23  10.461876  5.0577928  4.034925
i24  14.713012  5.6723841  7.914547
i25  10.751043  4.1856209  7.648005",
header=T)

library(tls)
tls(Y ~ X1 + X2 - 1, data = df)

Total Least Squares: nonconvex optimization

Example

References

Appendix 1: GAMS model

Appendix 2: R code

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112