Holt-Winters Double Exponential Smoothing method is a well-known method for smoothing data with a trend [1]. In [2] an Excel implementation was demonstrated. The formulas (from [2]) are: \[\begin{align} &u_1 = y_1 \\ & v_1 = 0 \\& u_t = \alpha y_t + (1-\alpha) (u_{t-1}+v_{t-1})&& t\gt 1 \\ &v_t = \beta (u_t - u_{t-1}) + (1-\beta)v_{t-1}&& t\gt 1 \\ & \hat{y}_{t+1} = u_{t}+v_{t} \end{align}\] The symbols are:
- \(y_t\) is the data,
- \(u_t\) and \(v_t\) are estimates of the smoothed value and the trend,
- \(\alpha\) and \(\beta\) are parameters. We have \(0 \le \alpha \le 1\) and \(0 \le \beta \le 1\).
- \(\hat{y}_t\) is the estimate for \(y\).
In [2] the MAE (Mean Absolute Error) is minimized. This is \[ \min \>\mathit{MAE}= \frac{\displaystyle\sum_{t\gt1} |\hat{y}_t - y_t|}{T-1} \] where \(T\) is the number of time periods. Of course, in practice a more popular measure is to minimize the Mean Squared Error. One reason for choosing a least absolute value criterion is that this provides robustness when outliers are present.
Data
I use the small data set from [2]:
---- 23 PARAMETER ydata
t1 3.000, t2 5.000, t3 9.000, t4 20.000, t5 12.000, t6 17.000, t7 22.000, t8 23.000
t9 51.000, t10 41.000, t11 56.000, t12 75.000, t13 60.000, t14 75.000, t15 88.000
Model
Often, in statistical software, the MSE version of the problem is modeled as a black-box function of just \(\alpha\) and \(\beta\) and solved using say a BFGS-B method (usually without analytic gradients, so the solver will use finite-differences). They typically ignore that we can end up in a local minimum (although most routines allow us to specify a starting point for \(\alpha\) and \(\beta\)). Here I want to state an explicit non-convex quadratic model that we can solve with different types of solvers. This approach will give us a much larger model, with variables \(\alpha\), \(\beta\), \(u\), \(v\), and \(\hat{y}\). In addition, when using an MAE objective, we need extra variables to linearize the absolute value.
| Non-convex Quadratic Model |
|---|
| \[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + (1-\color{darkred}\alpha) \cdot (\color{darkred}u_{t-1}+\color{darkred}v_{t-1})&& t\gt 1\\& \color{darkred}v_t = \color{darkred}\beta \cdot (\color{darkred}u_t - \color{darkred}u_{t-1}) + (1-\color{darkred}\beta)\cdot \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\] |
The first thing to do is to see if we can reproduce the results for \(\alpha=0.4\) and \(\beta=0.7\) as they are reported in [1]. We do this by fixing the variables \(\alpha\) and \(\beta\) and throw the model into my favorite NLP solver: CONOPT. We only need to fix these two variables, as all other levels will be pinned down by this. This fixed problem solves very fast:
Iter Phase Ninf Infeasibility RGmax NSB Step InItr MX OK
007.8250000000E+02 (Input point)
Pre-triangular equations: 42
Post-triangular equations: 29
100.0000000000E+00 (After pre-processing)
200.0000000000E+00 (After scaling)
** Feasible solution. Value of objective = 7.40965962794
Iter Phase Ninf Objective RGmax NSB Step InItr MX OK
437.4096596279E+000.0E+000
** Optimal solution. There are no superbasic variables.
Basically, CONOPT recovers the optimal solution during preprocessing. A few more iterations are needed, probably to get the basis right. The objective value is the same as reported in [2]. With this little experiment, we have learned a lot. Most likely our constraints and objective are correctly formulated.
Solution
We are now ready to try to solve the problem. When using \(\alpha=0.4\) and \(\beta=0.7\) as starting point, CONOPT quickly finds a very good solution. A solution report can look like:
---- 75 PARAMETER results
y u v yhat |e|
t1 3.000003.00000
t2 5.000003.427640.370033.000002.00000
t3 9.000004.910041.332543.797675.20233
t4 20.000009.184213.877866.2425813.75742
t5 12.0000012.834983.6813613.062071.06207
t6 17.0000016.619763.7708516.516340.48366
t7 22.0000020.734734.0686120.390611.60939
t8 23.0000024.417753.7349724.803341.80334
t9 51.0000033.037957.9620528.1527122.84729
t10 41.0000041.000007.9620541.00000
t11 56.0000050.466919.2641748.962057.03795
t12 75.0000062.9959112.0891559.7310915.26891
t13 60.0000071.859559.2981975.0850615.08506
t14 75.0000079.841088.1589281.157746.15774
t15 88.0000088.000008.1589288.00000
---- 76 VARIABLE alpha.L = 0.21382
VARIABLE beta.L = 0.86528
VARIABLE MAE.L = 6.59394
When we compare this with the Excel solution from [2], we see the CONOPT solution is actually better.
![]() |
| Optimal Excel Solver Solution (from [2]) |
What we see is:
| Variable | Excel solution [2] | Conopt solution |
|---|---|---|
| \(\alpha\) | 0.271817 | 0.21382 |
| \(\beta\) | 0.598161 | 0.86528 |
| MAE | 6.740208 | 6.59394 |
This is typical for non-convex problems: local NLP solvers may converge to a local optimum. One way to get a better handle on this is to use different starting points. Another way is to try a global solver. Using global solvers, we can show that indeed Conopt found the global optimum.
A global approach
The first thing I did, was to throw the above model unmodified at some global NLP solvers. This was a big disappointment for a small model like this:
| Solver | Obj | Time | Gap | Note |
|---|---|---|---|---|
| Baron | 6.5939 | 3600 | 95% | Time limit |
| Couenne | 6.5939 | 3600 | 78% | Time limit |
| Antigone | 6.5939 | 184 | 0% | Optimal |
| Gurobi | 6.5946 | 218 | 0% | Optimal |
Note: The runs with solvers Baron/Couenne/Antigone were done on a somewhat slow laptop. Gurobi ran on a faster (virtual) machine. So timings are not directly comparable between these solvers but are just indicative.
| Non-convex Quadratic Model v2 |
|---|
| \[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}w_t = \color{darkred}u_t+\color{darkred}v_t\\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + \color{darkred}w_{t-1} -\color{darkred}\alpha \cdot \color{darkred}w_{t-1}&& t\gt 1\\ & \color{darkred}x_t = \color{darkred}u_t - \color{darkred}u_{t-1} -\color{darkred}v_{t-1} && t\gt 1 \\& \color{darkred}v_t = \color{darkred}\beta \cdot \color{darkred}x_t + \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\] |
This version of the model behaves better:
| Solver | Obj | Time | Gap | Note |
|---|---|---|---|---|
| Baron | 6.5939 | 60 | 0% | Optimal |
| Couenne | 6.5939 | 3600 | 49% | Time limit |
| Antigone | 6.5939 | 157 | 0% | Optimal |
| Gurobi | 6.5939 | 157 | 0% | Optimal |
This reformulation is a big help for Baron, but also helps Gurobi delivering more accurate solutions.
The GAMS source of the first model can be found in [4].
Conclusion
The problem of finding optimal parameters \(\alpha\) and \(\beta\) for the Holt-Winters double exponential smoothing method leads to a non-convex quadratic optimization problem. Local solvers may deliver solutions that are only locally optimal (and thus not necessarily globally optimal). When trying to solve the global problem, it may help to reformulate the model in order to minimize the number of quadratic terms.
References
- Exponential Smoothing, https://en.wikipedia.org/wiki/Exponential_smoothing
- Holt's Linear Trend, https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-linear-trend/
- Handanhal V. Ravinder, Determining The Optimal Values Of Exponential Smoothing Constants – Does Solver Really Work?, American Journal Of Business Education, Volume 6, Number 3, 2013, https://files.eric.ed.gov/fulltext/EJ1054363.pdf. This paper shows some computational experience with Excel on these types of models.
- How to setup a minimization problem in GAMS, https://stackoverflow.com/questions/63738161/how-to-setup-a-minimization-problem-in-gams
