Total Least Squares (TLS) is an alternative for OLS (Ordinary Least Squares). It is a form of orthogonal regression and also deals with the problem of EIV (Errors-in-Variables).
The standard OLS model is \[\color{darkblue}y = \color{darkblue}X\color{darkred}\beta + \color{darkred}\varepsilon\] where we minimize the sum-of-squares of the residuals \[\min ||\color{darkred}\varepsilon||_2^2\] We can interpret \(\color{darkred}\varepsilon\) as the error in \(\color{darkblue}y\).
In TLS, we also allow for errors in \(\color{darkblue}X\). The model becomes \[\color{darkblue}y+\color{darkred}\varepsilon=(\color{darkblue}X+\color{darkred}E)\color{darkred}\beta\] Note that we made a sign change in \(\color{darkred}\varepsilon\). This is pure aesthetics: to make the equation more symmetric looking. The objective is specified as \[\min \> ||\left(\color{darkred}\varepsilon \> \color{darkred}E\right)||_F\] i.e. the Frobenius norm of the matrix formed by \(\color{darkred}\varepsilon\) and \(\color{darkred}E\). The Frobenius norm is just \[||A||_F=\sqrt{\sum_{i,j}a_{i,j}^2}\] We can drop the square root from the objective (the solution will remain the same, but we got rid of a non-linear function with a possible problem near zero: the gradient is not defined there). The remaining problem is a non-convex quadratic problem which can be solved with global MINLP solvers such as Baron or with a global quadratic solver like Gurobi.
One property of TLS is that orthogonal distances are measured instead of vertical ones for OLS:
Orthogonal distances for TLS (picture from [2]) |
Usually, linear TLS is solved using Singular Value Decomposition. But, here I want to try the following non-convex model:
Non-convex Quadratic Model |
---|
\[\begin{align}\min&\sum_i \color{darkred}\varepsilon_i^2+ \sum_{i,j}\color{darkred}E_{i,j}^2\\ & \color{darkblue}y_i + \color{darkred}\varepsilon_i = \sum_j \left(\color{darkblue}X_{i,j}+\color{darkred}E_{i,j}\right) \color{darkred}\beta_j && \forall i\end{align}\] |
Example
---- 51 PARAMETER data
y x1 x2
i1 5.6583900.4978083.435376
i2 10.2060821.5457459.413754
i3 11.7354841.6531349.476257
i4 10.9945952.8867859.694459
i5 8.1782342.35303910.770926
i6 10.5860222.7681205.760277
i7 4.4817382.0639612.004087
i8 7.4951353.5429603.440644
i9 10.9594112.1747416.331947
i10 4.8093242.8024161.620228
i11 13.2902203.40651110.506181
i12 9.7745143.5603765.563656
i13 12.0840653.4039178.248807
i14 7.9927314.4814982.519736
i15 12.1112853.9963637.249193
i16 14.4870923.9706839.410644
i17 11.0624003.7342517.364956
i18 7.5928654.7534974.695553
i19 11.2445803.4450856.214768
i20 7.8374136.7824336.367412
i21 12.1992754.1444867.570009
i22 13.8726305.45447610.463741
i23 10.4618765.0577934.034925
i24 14.7130125.6723847.914547
i25 10.7510434.1856217.648005
I tried the non-convex quadratic optimization model using GAMS with different global MINLP and QCP solvers. Most solvers found the optimal solution quite quickly but had enormous problems proving optimality.
---- 83 VARIABLE beta.L parameter to estimate
x1 1.282931, x2 0.844328
> tls(Y ~ X1 + X2, data = df)
Error in tls(Y ~ X1 + X2, data = df) :
Using total least squares requires model without intercept.
> tls(Y ~ X1 + X2 - 1, data = df)
$coefficient
X1 X2
1.28293140.8443277
$confidence.interval
2.5% lower bound 97.5% upper bound
X1 0.65088681.914976
X2 0.52489241.163763
$sd.est
X1 X2
0.32247760.1629802
Good to see we agree on the solution.
- What if we want a constant term? This is often modeled by a data column with ones. Looks to me that there is no error in this column.
- In OLS we may add dummy variables to indicate a structural break. This is encoded by a data column with 0's and 1's. Like the constant term, there is no measurement error involved here.
- These situations could be handled easily in the optimization model. How to handle this with a singular value decomposition-based TLS algorithm is described in [3].
- This is a good benchmark model for non-convex MINLP/QCP solvers: small but very difficult.
- The optimization model confirms my understanding of the underlying mathematical model. Such a model is often more precise and succinct than a verbal description.
- The TLS model measures orthogonal distances. This is dependent on the scale of the variables (columns). One could preprocess the data using a normalization step.
References
- Total least squares, https://en.wikipedia.org/wiki/Total_least_squares
- Ivan Markovsky, Sabine Van Huffel, Overview of total least squares methods, Signal Processing, Volume 87, Issue 10, October 2007, Pages 2283-2302, https://eprints.soton.ac.uk/263855/1/tls_overview.pdf
- P. de Groen, An Introduction to Total Least Squares, https://arxiv.org/pdf/math/9805076.pdf
Appendix 1: GAMS model
$ontext |
Appendix 2: R code
df <- read.table(text="
id Y X1 X2
i1 5.658390 0.4978076 3.435376
i2 10.206082 1.5457447 9.413754
i3 11.735484 1.6531337 9.476257
i4 10.994595 2.8867848 9.694459
i5 8.178234 2.3530392 10.770926
i6 10.586022 2.7681198 5.760277
i7 4.481738 2.0639606 2.004087
i8 7.495135 3.5429598 3.440644
i9 10.959411 2.1747406 6.331947
i10 4.809324 2.8024155 1.620228
i11 13.290220 3.4065109 10.506181
i12 9.774514 3.5603761 5.563656
i13 12.084065 3.4039173 8.248807
i14 7.992731 4.4814979 2.519736
i15 12.111285 3.9963628 7.249193
i16 14.487092 3.9706833 9.410644
i17 11.062400 3.7342514 7.364956
i18 7.592865 4.7534969 4.695553
i19 11.244580 3.4450848 6.214768
i20 7.837413 6.7824328 6.367412
i21 12.199275 4.1444857 7.570009
i22 13.872630 5.4544764 10.463741
i23 10.461876 5.0577928 4.034925
i24 14.713012 5.6723841 7.914547
i25 10.751043 4.1856209 7.648005",
header=T)
library(tls)
tls(Y ~ X1 + X2 - 1, data = df)