Optimal double exponential smoothing

Holt-Winters Double Exponential Smoothing method is a well-known method for smoothing data with a trend [1]. In [2] an Excel implementation was demonstrated. The formulas (from [2]) are: \[\begin{align} &u_1 = y_1 \\ & v_1 = 0 \\& u_t = \alpha y_t + (1-\alpha) (u_{t-1}+v_{t-1})&& t\gt 1 \\ &v_t = \beta (u_t - u_{t-1}) + (1-\beta)v_{t-1}&& t\gt 1 \\ & \hat{y}_{t+1} = u_{t}+v_{t} \end{align}\] The symbols are:

\(y_t\) is the data,
\(u_t\) and \(v_t\) are estimates of the smoothed value and the trend,
\(\alpha\) and \(\beta\) are parameters. We have \(0 \le \alpha \le 1\) and \(0 \le \beta \le 1\).
\(\hat{y}_t\) is the estimate for \(y\).

In [2] the MAE (Mean Absolute Error) is minimized. This is \[ \min \>\mathit{MAE}= \frac{\displaystyle\sum_{t\gt1} |\hat{y}_t - y_t|}{T-1} \] where \(T\) is the number of time periods. Of course, in practice a more popular measure is to minimize the Mean Squared Error. One reason for choosing a least absolute value criterion is that this provides robustness when outliers are present.

Data

I use the small data set from [2]:

----     23 PARAMETER ydata

t1   3.000,    t2   5.000,    t3   9.000,    t4  20.000,    t5  12.000,    t6  17.000,    t7  22.000,    t8  23.000
t9  51.000,    t10 41.000,    t11 56.000,    t12 75.000,    t13 60.000,    t14 75.000,    t15 88.000

Model

Often, in statistical software, the MSE version of the problem is modeled as a black-box function of just \(\alpha\) and \(\beta\) and solved using say a BFGS-B method (usually without analytic gradients, so the solver will use finite-differences). They typically ignore that we can end up in a local minimum (although most routines allow us to specify a starting point for \(\alpha\) and \(\beta\)). Here I want to state an explicit non-convex quadratic model that we can solve with different types of solvers. This approach will give us a much larger model, with variables \(\alpha\), \(\beta\), \(u\), \(v\), and \(\hat{y}\). In addition, when using an MAE objective, we need extra variables to linearize the absolute value.

Non-convex Quadratic Model
\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + (1-\color{darkred}\alpha) \cdot (\color{darkred}u_{t-1}+\color{darkred}v_{t-1})&& t\gt 1\\& \color{darkred}v_t = \color{darkred}\beta \cdot (\color{darkred}u_t - \color{darkred}u_{t-1}) + (1-\color{darkred}\beta)\cdot \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

Non-convex Quadratic Model

\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + (1-\color{darkred}\alpha) \cdot (\color{darkred}u_{t-1}+\color{darkred}v_{t-1})&& t\gt 1\\& \color{darkred}v_t = \color{darkred}\beta \cdot (\color{darkred}u_t - \color{darkred}u_{t-1}) + (1-\color{darkred}\beta)\cdot \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

The first thing to do is to see if we can reproduce the results for \(\alpha=0.4\) and \(\beta=0.7\) as they are reported in [1]. We do this by fixing the variables \(\alpha\) and \(\beta\) and throw the model into my favorite NLP solver: CONOPT. We only need to fix these two variables, as all other levels will be pinned down by this. This fixed problem solves very fast:

   Iter Phase Ninf   Infeasibility   RGmax    NSB   Step InItr MX OK
007.8250000000E+02 (Input point)

                   Pre-triangular equations:   42
                   Post-triangular equations:  29

100.0000000000E+00 (After pre-processing)
200.0000000000E+00 (After scaling)

 ** Feasible solution. Value of objective =    7.40965962794

   Iter Phase Ninf     Objective     RGmax    NSB   Step InItr MX OK
437.4096596279E+000.0E+000

 ** Optimal solution. There are no superbasic variables.

Basically, CONOPT recovers the optimal solution during preprocessing. A few more iterations are needed, probably to get the basis right. The objective value is the same as reported in [2]. With this little experiment, we have learned a lot. Most likely our constraints and objective are correctly formulated.

Solution

We are now ready to try to solve the problem. When using \(\alpha=0.4\) and \(\beta=0.7\) as starting point, CONOPT quickly finds a very good solution. A solution report can look like:

----     75 PARAMETER results

              y           u           v        yhat         |e|

t1      3.000003.00000
t2      5.000003.427640.370033.000002.00000
t3      9.000004.910041.332543.797675.20233
t4     20.000009.184213.877866.2425813.75742
t5     12.0000012.834983.6813613.062071.06207
t6     17.0000016.619763.7708516.516340.48366
t7     22.0000020.734734.0686120.390611.60939
t8     23.0000024.417753.7349724.803341.80334
t9     51.0000033.037957.9620528.1527122.84729
t10    41.0000041.000007.9620541.00000
t11    56.0000050.466919.2641748.962057.03795
t12    75.0000062.9959112.0891559.7310915.26891
t13    60.0000071.859559.2981975.0850615.08506
t14    75.0000079.841088.1589281.157746.15774
t15    88.0000088.000008.1589288.00000


----     76 VARIABLE alpha.L               =      0.21382
            VARIABLE beta.L                =      0.86528
            VARIABLE MAE.L                 =      6.59394

When we compare this with the Excel solution from [2], we see the CONOPT solution is actually better.

Optimal Excel Solver Solution (from [2])

What we see is:

Variable	Excel solution [2]	Conopt solution
\(\alpha\)	0.271817	0.21382
\(\beta\)	0.598161	0.86528
MAE	6.740208	6.59394

This is typical for non-convex problems: local NLP solvers may converge to a local optimum. One way to get a better handle on this is to use different starting points. Another way is to try a global solver. Using global solvers, we can show that indeed Conopt found the global optimum.

A global approach

The first thing I did, was to throw the above model unmodified at some global NLP solvers. This was a big disappointment for a small model like this:

Solver	Obj	Time	Gap	Note
Baron	6.5939	3600	95%	Time limit
Couenne	6.5939	3600	78%	Time limit
Antigone	6.5939	184	0%	Optimal
Gurobi	6.5946	218	0%	Optimal

Note: The runs with solvers Baron/Couenne/Antigone were done on a somewhat slow laptop. Gurobi ran on a faster (virtual) machine. So timings are not directly comparable between these solvers but are just indicative.

Looking again at our model, we can see that the model is way more non-linear than it needs to be. To fix this, we make the model even larger (more variables, more equations). But also less non-linear. The idea is that we replace terms like \[(x+y)z=xz+yz\] by \[\begin{align} & w=x+y \\ & wz \end{align}\] When we do this for our model, we get:

Non-convex Quadratic Model v2
\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}w_t = \color{darkred}u_t+\color{darkred}v_t\\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + \color{darkred}w_{t-1} -\color{darkred}\alpha \cdot \color{darkred}w_{t-1}&& t\gt 1\\ & \color{darkred}x_t = \color{darkred}u_t - \color{darkred}u_{t-1} -\color{darkred}v_{t-1} && t\gt 1 \\& \color{darkred}v_t = \color{darkred}\beta \cdot \color{darkred}x_t + \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

Non-convex Quadratic Model v2

\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}w_t = \color{darkred}u_t+\color{darkred}v_t\\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + \color{darkred}w_{t-1} -\color{darkred}\alpha \cdot \color{darkred}w_{t-1}&& t\gt 1\\ & \color{darkred}x_t = \color{darkred}u_t - \color{darkred}u_{t-1} -\color{darkred}v_{t-1} && t\gt 1 \\& \color{darkred}v_t = \color{darkred}\beta \cdot \color{darkred}x_t + \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

This version of the model behaves better:

Solver	Obj	Time	Gap	Note
Baron	6.5939	60	0%	Optimal
Couenne	6.5939	3600	49%	Time limit
Antigone	6.5939	157	0%	Optimal
Gurobi	6.5939	157	0%	Optimal

This reformulation is a big help for Baron, but also helps Gurobi delivering more accurate solutions.

The GAMS source of the first model can be found in [4].

Conclusion

The problem of finding optimal parameters \(\alpha\) and \(\beta\) for the Holt-Winters double exponential smoothing method leads to a non-convex quadratic optimization problem. Local solvers may deliver solutions that are only locally optimal (and thus not necessarily globally optimal). When trying to solve the global problem, it may help to reformulate the model in order to minimize the number of quadratic terms.

References

Exponential Smoothing, https://en.wikipedia.org/wiki/Exponential_smoothing
Holt's Linear Trend, https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-linear-trend/
Handanhal V. Ravinder, Determining The Optimal Values Of Exponential Smoothing Constants – Does Solver Really Work?, American Journal Of Business Education, Volume 6, Number 3, 2013, https://files.eric.ed.gov/fulltext/EJ1054363.pdf. This paper shows some computational experience with Excel on these types of models.
How to setup a minimization problem in GAMS, https://stackoverflow.com/questions/63738161/how-to-setup-a-minimization-problem-in-gams

Optimal double exponential smoothing

Data

Model

Solution

A global approach

Conclusion

References

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...