GARCH Estimation¶
This post details GARCH(1,1) model and its estimation manually in Python, compared to using libraries and in Stata. For GJRGARCH(1,1), see my documentation on frds.io.
GARCH(1,1) Model¶
The GARCH(1,1) (Generalized Autoregressive Conditional Heteroskedasticity) model is a commonly used model for capturing the timevarying volatility in financial time series data. The model can be defined as follows:
Return equation¶
There are many ways to specify return dynamics. Let's focus on the simplest one assuming a constant mean.
Here \( r_t \) represents the return at time \( t \), and \( \mu \) is the mean return.
Shock equation¶
In this equation, \( \epsilon_t \) is the shock term, \( \sigma_t \) is the conditional volatility, and \( z_t \) is a white noise error term with zero mean and unit variance (\( z_t \sim N(0,1) \)).
Note
We can also assume that the noise term follows a different distribution, such as Studentt, and modify the likelihood function below accordingly.
Volatility equation¶
Here \( \sigma_t^2 \) is the conditional variance at time \( t \), and \( \omega \), \( \alpha \), \( \beta \) are parameters to be estimated. This equation captures how volatility evolves over time.
The unconditional variance and persistence
The unconditional variance, often denoted as \(\text{Var}(\epsilon_t)\) or \(\sigma^2\), refers to the longrun average or steadystate variance of the return series. It is the variance one would expect the series to revert to over the long term, and it doesn't condition on any past information.
For a GARCH(1,1) model to be stationary, the persistence, sum of \(\alpha\) and \(\beta\), must be less than 1 (\(\alpha + \beta < 1\)). Given this condition, the unconditional variance \(\sigma^2\) can be computed as follows:
In this formulation, \(\omega\) is the constant or "base" level of volatility, while \(\alpha\) and \(\beta\) determine how shocks to returns and past volatility influence future volatility. The unconditional variance provides a longrun average level around which the conditional variance oscillates.
Loglikelihood function¶
The likelihood function for a GARCH(1,1) model is used for the estimation of parameters \(\mu\), \(\omega\), \(\alpha\), and \(\beta\). Given a time series of returns \( \{ r_1, r_2, \ldots, r_T \} \), the likelihood function \( L(\mu, \omega, \alpha, \beta) \) can be written as:
Taking the natural logarithm of \( L \), we obtain the loglikelihood function \( \ell(\mu, \omega, \alpha, \beta) \):
The parameters \(\mu, \omega, \alpha, \beta \) can then be estimated by maximizing this loglikelihood function.
Estimation¶
Now let's use an example to showcase the estimation of GARCH(1,1).
import pandas as pd
df = pd.read_stata("https://www.statapress.com/data/r18/stocks.dta", convert_dates=['date'])
df.set_index("date", inplace=True)
The dataframe df
is like below, with daily returns for stocks "toyota", "nissan" and "honda". Notice that there are gaps in the date
, but variable t
is without gaps.
>>> df.head()
t toyota nissan honda
date
20030102 1 0.015167 0.029470 0.031610
20030103 2 0.004820 0.008173 0.002679
20030106 3 0.019959 0.013064 0.001606
20030107 4 0.013323 0.007444 0.011318
20030108 5 0.027001 0.018857 0.016945
We focus on the returns of "toyota", and scale returns to percentage returns for better optimization results.
Estimation using arch
library¶
It's remarkably easy to estimate GARCH(1,1) with the arch
library.
from arch import arch_model
model = arch_model(returns, mean='Constant', vol='GARCH', p=1, q=1)
model.fit()
We have the following outputs.
Iteration: 1, Func. Count: 6, Neg. LLF: 353027828870.51434
Iteration: 2, Func. Count: 14, Neg. LLF: 5686315977.791412
Iteration: 3, Func. Count: 23, Neg. LLF: 288989718.5625958
Iteration: 4, Func. Count: 29, Neg. LLF: 3757.5687528616645
Iteration: 5, Func. Count: 35, Neg. LLF: 6165.184701734164
Iteration: 6, Func. Count: 41, Neg. LLF: 3751.933155975204
Iteration: 7, Func. Count: 47, Neg. LLF: 3748.8551639756433
Iteration: 8, Func. Count: 52, Neg. LLF: 3748.8245639433794
Iteration: 9, Func. Count: 57, Neg. LLF: 3748.8226971150325
Iteration: 10, Func. Count: 62, Neg. LLF: 3748.8215357459
Iteration: 11, Func. Count: 67, Neg. LLF: 3748.8215327016333
Iteration: 12, Func. Count: 71, Neg. LLF: 3748.8215327001894
Optimization terminated successfully (Exit mode 0)
Current function value: 3748.8215327016333
Iterations: 12
Function evaluations: 71
Gradient evaluations: 12
Constant Mean  GARCH Model Results
==============================================================================
Dep. Variable: y Rsquared: 0.000
Mean Model: Constant Mean Adj. Rsquared: 0.000
Vol Model: GARCH LogLikelihood: 3748.82
Distribution: Normal AIC: 7505.64
Method: Maximum Likelihood BIC: 7528.08
No. Observations: 2015
Date: Tue, Sep 26 2023 Df Residuals: 2014
Time: 21:36:03 Df Model: 1
Mean Model
=============================================================================
coef std err t P>t 95.0% Conf. Int.

mu 0.0396 3.054e02 1.297 0.195 [2.025e02,9.945e02]
Volatility Model
============================================================================
coef std err t P>t 95.0% Conf. Int.

omega 0.0279 1.374e02 2.030 4.237e02 [9.609e04,5.484e02]
alpha[1] 0.0694 1.422e02 4.884 1.039e06 [4.157e02,9.730e02]
beta[1] 0.9217 1.601e02 57.558 0.000 [ 0.890, 0.953]
============================================================================
Covariance estimator: robust
ARCHModelResult, id: 0x1cb7b8f4460
GARCH(1,1) in Stata
To estimate GARCH(1,1) in Stata, use the following code:
use https://www.statapress.com/data/r18/stocks, clear
replace toyota = toyota * 100
arch toyota, arch(1) garch(1) vce(robust)
Outputs are:
(setting optimization to BHHH)
Iteration 0: log pseudolikelihood = 4075.9402
Iteration 1: log pseudolikelihood = 3953.7987
Iteration 2: log pseudolikelihood = 3911.4597
Iteration 3: log pseudolikelihood = 3883.4575
Iteration 4: log pseudolikelihood = 3857.2121
(switching optimization to BFGS)
Iteration 5: log pseudolikelihood = 3837.462
Iteration 6: log pseudolikelihood = 3765.4267
Iteration 7: log pseudolikelihood = 3761.9975
Iteration 8: log pseudolikelihood = 3753.8487
Iteration 9: log pseudolikelihood = 3752.0844
Iteration 10: log pseudolikelihood = 3751.921
Iteration 11: log pseudolikelihood = 3751.3856
Iteration 12: log pseudolikelihood = 3749.877
Iteration 13: log pseudolikelihood = 3749.3108
Iteration 14: log pseudolikelihood = 3749.2614
(switching optimization to BHHH)
Iteration 15: log pseudolikelihood = 3749.249
Iteration 16: log pseudolikelihood = 3749.2487
Iteration 17: log pseudolikelihood = 3749.2487
ARCH family regression
Sample: 1 thru 2015 Number of obs = 2015
Wald chi2(.) = .
Log pseudolikelihood = 3749.249 Prob > chi2 = .

 Semirobust
toyota  Coefficient std. err. z P>z [95% conf. interval]
+
toyota 
_cons  .0403521 .0305032 1.32 0.186 .019433 .1001373
+
ARCH 
arch 
L1.  .0703684 .0142609 4.93 0.000 .0424176 .0983192

garch 
L1.  .9204512 .0161376 57.04 0.000 .8888221 .9520802

_cons  .0284799 .0139792 2.04 0.042 .0010812 .0558786

There are some differences in the estimates, potentially because of the optimization algorithms and/or initial parameters. Based on the loglikelihood, it seems arch
results are marginally better.
Manual estimation¶
Let's try do this manually to see how we can use maximum likelihood estimation (MLE) to estimate the parameters.
Note
This code below is not optimized. See the discussion later and source code of frds.algorithms.GARCHModel for a better version.
from scipy.optimize import minimize
import numpy as np
# LogLikelihood function for GARCH(1, 1)
def garch_log_likelihood(params, returns):
mu, omega, alpha, beta = params
T = len(returns)
sigma2 = np.empty(T)
if alpha + beta < 1:
sigma2[0] = omega / (1  alpha  beta)
else:
sigma2[0] = omega / 0.001
for t in range(1, T):
sigma2[t] = omega + alpha * (returns[t1]mu)**2 + beta * sigma2[t1]
sigma2[t] = sigma2[t] if sigma2[t] > 0 else 0.001
log_likelihood = 0.5 * np.sum(np.log(2*np.pi*sigma2) + ((returnsmu)**2 / sigma2))
return log_likelihood
# Persistence of GARCH(1,1) = alpha + beta has to be smaller than 1
def constraint_persistence(params):
_, _, alpha, beta = params
return 1.0  (alpha + beta)
# Initial parameter guess [mu, omega, alpha, beta]
initial_params = [returns.mean(), 0.05, 0.1, 0.88]
# Run the optimizer to maximize the loglikelihood (minimize its negative)
opt_results = minimize(garch_log_likelihood, initial_params, method="SLSQP",
args=(returns),
bounds=[
# No bounds for mu
(np.inf, np.inf),
# Lower bound for omega
# because sigma[0] may be equal to omega
# if alpha+beta both are zero
(0.001, np.inf),
# Bounds for alpha and beta
(0.0, 1.0),
(0.0, 1.0)],
constraints={'type': 'ineq', 'fun': constraint_persistence})
# Extract the estimated parameters
mu_hat, w_hat, alpha_hat, beta_hat = opt_results.x
print(f'Estimated parameters: \n \
mu={mu_hat}\n \
omega={w_hat}\n \
alpha={alpha_hat}\n \
beta={beta_hat}')
print(f"Persistence: alpha+beta={alpha_hat+beta_hat}")
print(f"Loglikelihood: {garch_log_likelihood(opt_results.x, returns)}")
The outputs are
Estimated parameters:
mu=0.040173898536452125
omega=0.02888320573536973
alpha=0.06838623740784693
beta=0.9214019468044863
Persistence: alpha+beta=0.9897881842123332
Loglikelihood: 3749.0392910447126
Caution
The code above serves only demonstration purpose.
It appears that we have quite similar coefficient estimates from the above manual work to the arch
results. However, we need to be super careful here because the choice of initial parameters has significant impact on the output.
Further, based on the loglikelihood, the manual estimation results are not as good as in arch
: the loglikelihood is somewhere between the Iteration 6 and 7 out of 12 in the arch
estimation.
I checked a bit the source code of arch
. The authors did a lot of work in selecting the optimal initial parameters and setting bounds on the conditional volalatilies. But here in the manual work, I relied on only rough guess in selecting initial values and did not attempt to compute bounds.
Note
I leave out the estimation of standard errors in this post.
Improvements¶
In this part, I explore how to improve the estimation and enhance its numerical stability. Insights are drawn from the arch
source code.
Initial value of conditional variance¶
Note that the conditional variance in a GARCH(1,1) model is:
We need a good starting value \(\sigma_0^2\) to begin with, which can be estimated via the backcasting technique. Once we have that \( \sigma^2_0 \) through backcasting, we can proceed to calculate the entire series of conditional variances using the standard GARCH recursion formula.
To backcast the initial variance, we can use the Exponential Weighted Moving Average (EWMA) method, setting \(\sigma^2_0\) to the EWMA of the sample variance of the first \(n \leq T\) returns:
where \( w_t \) are the exponentially decaying weights and \( r_t \) are residuals of returns, i.e., returns demeaned by sample average. This \(\sigma^2_0\) is then used to derive \(\sigma^2_1\) the starting value for the conditional variance series.
Initial value of \(\omega\)¶
The starting value of \(\omega\) is relatively straightforward. Notice that earlier we have jotted down the unconditional variance \(\sigma^2 = \frac{\omega}{1\alpha\beta}\). Therefore, given a level of persistence (\(\alpha+\beta\)), we can set the initial guess of \(\omega\) to be the sample variance times one minus persistence:
where we use the known sample variance of residuals \(\hat{\sigma}^2\) as a guess for the unconditional variance \(\sigma^2\). However, we still need to find good starting values for \(\alpha\) and \(\beta\).
Initial value of \(\alpha\) and \(\beta\)¶
Unfortunately, there is no better way to find good starting values for \(\alpha\) and \(\beta\) than a grid search. Luckily, this grid search can be relatively small.
First, we don't know ex ante the persistence level, so we need to vary the persistence level from some low values to some high values, e.g., from 0.1 to 0.98. Second, generally the \(\alpha\) parameter is not too big, for example, ranging from 0.01 to 0.2.
We can permute combinations of the persistence level and \(\alpha\), which naturally gives the corresponding \(\beta\) and hence \(\omega\). The "optimal" set of initial values of \(\omega, \alpha, \beta\) is the one that gives the highest loglikelihood.^{1}
Variance bounds¶
Another issue is that we want to ensure that in the estimation, condition variance does not blow up to infinity or becomes zero. Hence, we need to construct bounds for conditional variances during the GARCH(1,1) parameter estimation process.
To do this, we can calculate loose lower and upper bounds for each observation. Specifically, we can use sample variance of the residuals to compute global lower and upper bounds. We then use EWMA to compute the conditional variance for each time point. The EWMA variances are then adjusted to ensure they are within global bounds. Lastly, we scale the adjusted EWMA variances to form the variance bounds at each time.
During the estimation process, whenever we compute the conditional variances based on the prevailing model parameters, we ensure that they are adjusted to be reasonably within the bounds at each time.
Note
See frds.algorithms.GARCHModel for the full implementation of the above improvements.

The initial value of \(\mu\) is reasonably set to the sample mean return. ↩