Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 809

Optimal double exponential smoothing

$
0
0

Holt-Winters Double Exponential Smoothing method is a well-known method for smoothing data with a trend [1].  In [2] an Excel implementation was demonstrated. The formulas (from [2]) are: \[\begin{align} &u_1 = y_1 \\ & v_1 = 0 \\& u_t = \alpha y_t + (1-\alpha) (u_{t-1}+v_{t-1})&& t\gt 1 \\ &v_t = \beta (u_t - u_{t-1}) + (1-\beta)v_{t-1}&& t\gt 1 \\ & \hat{y}_{t+1} = u_{t}+v_{t}  \end{align}\] The symbols are:

  • \(y_t\) is the data,
  • \(u_t\) and \(v_t\) are estimates of the smoothed value and the trend,
  • \(\alpha\) and \(\beta\) are parameters. We have \(0 \le \alpha \le 1\) and \(0 \le \beta \le 1\).
  • \(\hat{y}_t\) is the estimate for \(y\). 

In [2] the MAE (Mean Absolute Error) is minimized. This is \[ \min \>\mathit{MAE}= \frac{\displaystyle\sum_{t\gt1} |\hat{y}_t - y_t|}{T-1} \] where \(T\) is the number of time periods. Of course, in practice a more popular measure is to minimize the Mean Squared Error. One reason for choosing a least absolute value criterion is that this provides robustness when outliers are present.


Data


I use the small data set from [2]:


----     23 PARAMETER ydata

t1 3.000, t2 5.000, t3 9.000, t4 20.000, t5 12.000, t6 17.000, t7 22.000, t8 23.000
t9 51.000, t10 41.000, t11 56.000, t12 75.000, t13 60.000, t14 75.000, t15 88.000


Model


Often, in statistical software, the MSE version of the problem is modeled as a black-box function of just  \(\alpha\) and \(\beta\) and solved using say a BFGS-B method (usually without analytic gradients, so the solver will use finite-differences). They typically ignore that we can end up in a local minimum (although most routines allow us to specify a starting point for \(\alpha\) and \(\beta\)). Here I want to state an explicit non-convex quadratic model that we can solve with different types of solvers. This approach will give us a much larger model, with variables \(\alpha\), \(\beta\), \(u\), \(v\), and  \(\hat{y}\). In addition, when using an MAE objective, we need extra variables to linearize the absolute value. 


Non-convex Quadratic Model
\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}u_t = \color{darkred}\alpha \cdot  \color{darkblue}y_t + (1-\color{darkred}\alpha) \cdot (\color{darkred}u_{t-1}+\color{darkred}v_{t-1})&& t\gt 1\\& \color{darkred}v_t = \color{darkred}\beta \cdot  (\color{darkred}u_t - \color{darkred}u_{t-1}) + (1-\color{darkred}\beta)\cdot \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}-  \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ &  \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]


The first thing to do is to see if we can reproduce the results for \(\alpha=0.4\) and \(\beta=0.7\) as they are reported in [1]. We do this by fixing the variables \(\alpha\) and \(\beta\) and throw the model into my favorite NLP solver: CONOPT. We only need to fix these two variables, as all other levels will be pinned down by this. This fixed problem solves very fast:


   Iter Phase Ninf   Infeasibility   RGmax    NSB   Step InItr MX OK
007.8250000000E+02 (Input point)

Pre-triangular equations: 42
Post-triangular equations: 29

100.0000000000E+00 (After pre-processing)
200.0000000000E+00 (After scaling)

** Feasible solution. Value of objective = 7.40965962794

Iter Phase Ninf Objective RGmax NSB Step InItr MX OK
437.4096596279E+000.0E+000

** Optimal solution. There are no superbasic variables.


Basically, CONOPT recovers the optimal solution during preprocessing. A few more iterations are needed, probably to get the basis right. The objective value is the same as reported in [2]. With this little experiment, we have learned a lot. Most likely our constraints and objective are correctly formulated.


Solution


We are now ready to try to solve the problem. When using  \(\alpha=0.4\) and \(\beta=0.7\) as starting point, CONOPT quickly finds a very good solution. A solution report can look like:


----     75 PARAMETER results

y u v yhat |e|

t1 3.000003.00000
t2 5.000003.427640.370033.000002.00000
t3 9.000004.910041.332543.797675.20233
t4 20.000009.184213.877866.2425813.75742
t5 12.0000012.834983.6813613.062071.06207
t6 17.0000016.619763.7708516.516340.48366
t7 22.0000020.734734.0686120.390611.60939
t8 23.0000024.417753.7349724.803341.80334
t9 51.0000033.037957.9620528.1527122.84729
t10 41.0000041.000007.9620541.00000
t11 56.0000050.466919.2641748.962057.03795
t12 75.0000062.9959112.0891559.7310915.26891
t13 60.0000071.859559.2981975.0850615.08506
t14 75.0000079.841088.1589281.157746.15774
t15 88.0000088.000008.1589288.00000


---- 76 VARIABLE alpha.L = 0.21382
VARIABLE beta.L = 0.86528
VARIABLE MAE.L = 6.59394


When we compare this with the Excel solution from [2], we see the CONOPT solution is actually better.


Optimal Excel Solver Solution (from [2])

What we see is:


VariableExcel solution [2]Conopt solution
\(\alpha\)0.2718170.21382
\(\beta\)0.5981610.86528
MAE6.7402086.59394

This is typical for non-convex problems: local NLP solvers may converge to a local optimum. One way to get a better handle on this is to use different starting points. Another way is to try a global solver. Using global solvers, we can show that indeed Conopt found the global optimum.


A global approach


The first thing I did, was to throw the above model unmodified at some global NLP solvers. This was a big disappointment for a small model like this:



SolverObjTimeGapNote
Baron6.5939360095%Time limit
Couenne6.5939360078%Time limit
Antigone6.59391840%Optimal
Gurobi6.59462180%Optimal

Note: The runs with solvers Baron/Couenne/Antigone were done on a somewhat slow laptop. Gurobi ran on a faster (virtual) machine. So timings are not directly comparable between these solvers but are just indicative. 

Looking again at our model, we can see that the model is way more non-linear than it needs to be. To fix this, we make the model even larger (more variables, more equations). But also less non-linear. The idea is that we replace terms like \[(x+y)z=xz+yz\] by \[\begin{align} & w=x+y \\ & wz \end{align}\] When we do this for our model, we get: 

Non-convex Quadratic Model v2
\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}w_t = \color{darkred}u_t+\color{darkred}v_t\\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + \color{darkred}w_{t-1} -\color{darkred}\alpha \cdot \color{darkred}w_{t-1}&& t\gt 1\\ & \color{darkred}x_t = \color{darkred}u_t - \color{darkred}u_{t-1} -\color{darkred}v_{t-1} && t\gt 1 \\& \color{darkred}v_t = \color{darkred}\beta \cdot \color{darkred}x_t + \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}-  \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ &  \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

This version of the model behaves better: 
 
SolverObjTimeGapNote
Baron6.5939600%Optimal
Couenne6.5939360049%Time limit
Antigone6.59391570%Optimal
Gurobi6.59391570%Optimal


This reformulation is a big help for Baron, but also helps Gurobi delivering more accurate solutions.

The GAMS source of the first model can be found in [4].


Conclusion


The problem of finding optimal parameters \(\alpha\) and \(\beta\) for the Holt-Winters double exponential smoothing method leads to a non-convex quadratic optimization problem. Local solvers may deliver solutions that are only locally optimal (and thus not necessarily globally optimal). When trying to solve the global problem, it may help to reformulate the model in order to minimize the number of quadratic terms.


References


  1. Exponential Smoothing, https://en.wikipedia.org/wiki/Exponential_smoothing
  2. Holt's Linear Trend, https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-linear-trend/
  3. Handanhal V. Ravinder, Determining The Optimal Values Of Exponential Smoothing Constants – Does Solver Really Work?, American Journal Of Business Education, Volume 6, Number 3, 2013, https://files.eric.ed.gov/fulltext/EJ1054363.pdf. This paper shows some computational experience with Excel on these types of models.
  4. How to setup a minimization problem in GAMS, https://stackoverflow.com/questions/63738161/how-to-setup-a-minimization-problem-in-gams


Viewing all articles
Browse latest Browse all 809

Trending Articles