GARCH Estimation
This post details GARCH(1,1) model and its estimation manually in Python, compared to using libraries and in Stata. For GJR-GARCH(1,1), see my documentation on frds.io.
GARCH(1,1) Model
The GARCH(1,1) (Generalized Autoregressive Conditional Heteroskedasticity) model is a commonly used model for capturing the time-varying volatility in financial time series data. The model can be defined as follows:
Return equation
There are many ways to specify return dynamics. Let’s focus on the simplest one assuming a constant mean.
Here
Shock equation
In this equation,
We can also assume that the noise term follows a different distribution, such as Student-t, and modify the likelihood function below accordingly.
Volatility equation
Here
The unconditional variance, often denoted as
For a GARCH(1,1) model to be stationary, the persistence, sum of
In this formulation,
Log-likelihood function
The likelihood function for a GARCH(1,1) model is used for the estimation of parameters
Taking the natural logarithm of
The parameters
Estimation
Now let’s use an example to showcase the estimation of GARCH(1,1).
import pandas as pd
= pd.read_stata("https://www.stata-press.com/data/r18/stocks.dta", convert_dates=['date'])
df "date", inplace=True) df.set_index(
The dataframe df
is like below, with daily returns for stocks “toyota”, “nissan” and “honda”. Notice that there are gaps in the date
, but variable t
is without gaps.
>>> df.head()
t toyota nissan honda
date2003-01-02 1 0.015167 0.029470 0.031610
2003-01-03 2 0.004820 0.008173 0.002679
2003-01-06 3 0.019959 0.013064 -0.001606
2003-01-07 4 -0.013323 -0.007444 -0.011318
2003-01-08 5 -0.027001 -0.018857 -0.016945
We focus on the returns of “toyota”, and scale returns to percentage returns for better optimization results.
= df["toyota"].to_numpy() * 100 returns
Estimation using arch
library
It’s remarkably easy to estimate GARCH(1,1) with the arch
library.
from arch import arch_model
= arch_model(returns, mean='Constant', vol='GARCH', p=1, q=1)
model model.fit()
We have the following outputs.
1, Func. Count: 6, Neg. LLF: 353027828870.51434
Iteration: 2, Func. Count: 14, Neg. LLF: 5686315977.791412
Iteration: 3, Func. Count: 23, Neg. LLF: 288989718.5625958
Iteration: 4, Func. Count: 29, Neg. LLF: 3757.5687528616645
Iteration: 5, Func. Count: 35, Neg. LLF: 6165.184701734164
Iteration: 6, Func. Count: 41, Neg. LLF: 3751.933155975204
Iteration: 7, Func. Count: 47, Neg. LLF: 3748.8551639756433
Iteration: 8, Func. Count: 52, Neg. LLF: 3748.8245639433794
Iteration: 9, Func. Count: 57, Neg. LLF: 3748.8226971150325
Iteration: 10, Func. Count: 62, Neg. LLF: 3748.8215357459
Iteration: 11, Func. Count: 67, Neg. LLF: 3748.8215327016333
Iteration: 12, Func. Count: 71, Neg. LLF: 3748.8215327001894
Iteration: 0)
Optimization terminated successfully (Exit mode 3748.8215327016333
Current function value: 12
Iterations: 71
Function evaluations: 12
Gradient evaluations: - GARCH Model Results
Constant Mean ==============================================================================
-squared: 0.000
Dep. Variable: y R-squared: 0.000
Mean Model: Constant Mean Adj. R-Likelihood: -3748.82
Vol Model: GARCH Log7505.64
Distribution: Normal AIC: 7528.08
Method: Maximum Likelihood BIC: 2015
No. Observations: 26 2023 Df Residuals: 2014
Date: Tue, Sep 21:36:03 Df Model: 1
Time:
Mean Model =============================================================================
>|t| 95.0% Conf. Int.
coef std err t P-----------------------------------------------------------------------------
0.0396 3.054e-02 1.297 0.195 [-2.025e-02,9.945e-02]
mu
Volatility Model ============================================================================
>|t| 95.0% Conf. Int.
coef std err t P----------------------------------------------------------------------------
0.0279 1.374e-02 2.030 4.237e-02 [9.609e-04,5.484e-02]
omega 1] 0.0694 1.422e-02 4.884 1.039e-06 [4.157e-02,9.730e-02]
alpha[1] 0.9217 1.601e-02 57.558 0.000 [ 0.890, 0.953]
beta[============================================================================
Covariance estimator: robustid: 0x1cb7b8f4460 ARCHModelResult,
To estimate GARCH(1,1) in Stata, use the following code:
use https://www.stata-press.com/data/r18/stocks, clear
replace toyota = toyota * 100
arch toyota, arch(1) garch(1) vce(robust)
Outputs are:
(setting optimization to BHHH)log pseudolikelihood = -4075.9402
Iteration 0: log pseudolikelihood = -3953.7987
Iteration 1: log pseudolikelihood = -3911.4597
Iteration 2: log pseudolikelihood = -3883.4575
Iteration 3: log pseudolikelihood = -3857.2121
Iteration 4:
(switching optimization to BFGS)log pseudolikelihood = -3837.462
Iteration 5: log pseudolikelihood = -3765.4267
Iteration 6: log pseudolikelihood = -3761.9975
Iteration 7: log pseudolikelihood = -3753.8487
Iteration 8: log pseudolikelihood = -3752.0844
Iteration 9: log pseudolikelihood = -3751.921
Iteration 10: log pseudolikelihood = -3751.3856
Iteration 11: log pseudolikelihood = -3749.877
Iteration 12: log pseudolikelihood = -3749.3108
Iteration 13: log pseudolikelihood = -3749.2614
Iteration 14:
(switching optimization to BHHH)log pseudolikelihood = -3749.249
Iteration 15: log pseudolikelihood = -3749.2487
Iteration 16: log pseudolikelihood = -3749.2487
Iteration 17:
family regression
ARCH
of obs = 2015
Sample: 1 thru 2015 Number chi2(.) = .
Wald chi2 = .
Log pseudolikelihood = -3749.249 Prob >
------------------------------------------------------------------------------
| Semirobuststd. err. z P>|z| [95% conf. interval]
toyota | Coefficient
-------------+----------------------------------------------------------------
toyota |_cons | .0403521 .0305032 1.32 0.186 -.019433 .1001373
-------------+----------------------------------------------------------------
ARCH |arch |
L1. | .0703684 .0142609 4.93 0.000 .0424176 .0983192
|
garch |
L1. | .9204512 .0161376 57.04 0.000 .8888221 .9520802
|_cons | .0284799 .0139792 2.04 0.042 .0010812 .0558786
------------------------------------------------------------------------------
There are some differences in the estimates, potentially because of the optimization algorithms and/or initial parameters. Based on the log-likelihood, it seems arch
results are marginally better.
Manual estimation
Let’s try do this manually to see how we can use maximum likelihood estimation (MLE) to estimate the parameters.
This code below is not optimized. See the discussion later and source code of frds.algorithms.GARCHModel for a better version.
from scipy.optimize import minimize
import numpy as np
# Log-Likelihood function for GARCH(1, 1)
def garch_log_likelihood(params, returns):
= params
mu, omega, alpha, beta = len(returns)
T = np.empty(T)
sigma2 if alpha + beta < 1:
0] = omega / (1 - alpha - beta)
sigma2[else:
0] = omega / 0.001
sigma2[
for t in range(1, T):
= omega + alpha * (returns[t-1]-mu)**2 + beta * sigma2[t-1]
sigma2[t] = sigma2[t] if sigma2[t] > 0 else 0.001
sigma2[t]
= 0.5 * np.sum(np.log(2*np.pi*sigma2) + ((returns-mu)**2 / sigma2))
log_likelihood return log_likelihood
# Persistence of GARCH(1,1) = alpha + beta has to be smaller than 1
def constraint_persistence(params):
= params
_, _, alpha, beta return 1.0 - (alpha + beta)
# Initial parameter guess [mu, omega, alpha, beta]
= [returns.mean(), 0.05, 0.1, 0.88]
initial_params
# Run the optimizer to maximize the log-likelihood (minimize its negative)
= minimize(garch_log_likelihood, initial_params, method="SLSQP",
opt_results =(returns),
args=[
bounds# No bounds for mu
-np.inf, np.inf),
(# Lower bound for omega
# because sigma[0] may be equal to omega
# if alpha+beta both are zero
0.001, np.inf),
(# Bounds for alpha and beta
0.0, 1.0),
(0.0, 1.0)],
(={'type': 'ineq', 'fun': constraint_persistence})
constraints
# Extract the estimated parameters
= opt_results.x
mu_hat, w_hat, alpha_hat, beta_hat
print(f'Estimated parameters: \n \
mu={mu_hat}\n \
omega={w_hat}\n \
alpha={alpha_hat}\n \
beta={beta_hat}')
print(f"Persistence: alpha+beta={alpha_hat+beta_hat}")
print(f"Log-likelihood: {-garch_log_likelihood(opt_results.x, returns)}")
The outputs are
Estimated parameters: =0.040173898536452125
mu=0.02888320573536973
omega=0.06838623740784693
alpha=0.9214019468044863
beta+beta=0.9897881842123332
Persistence: alpha-likelihood: -3749.0392910447126 Log
The code above serves only demonstration purpose.
It appears that we have quite similar coefficient estimates from the above manual work to the arch
results. However, we need to be super careful here because the choice of initial parameters has significant impact on the output.
Further, based on the log-likelihood, the manual estimation results are not as good as in arch
: the log-likelihood is somewhere between the Iteration 6 and 7 out of 12 in the arch
estimation.
I checked a bit the source code of arch
. The authors did a lot of work in selecting the optimal initial parameters and setting bounds on the conditional volalatilies. But here in the manual work, I relied on only rough guess in selecting initial values and did not attempt to compute bounds.
I leave out the estimation of standard errors in this post.
Improvements
In this part, I explore how to improve the estimation and enhance its numerical stability. Insights are drawn from the arch
source code.
Initial value of conditional variance
Note that the conditional variance in a GARCH(1,1) model is:
We need a good starting value
To backcast the initial variance, we can use the Exponential Weighted Moving Average (EWMA) method, setting
where
Initial value of
The starting value of
where we use the known sample variance of residuals
Initial value of and
Unfortunately, there is no better way to find good starting values for
First, we don’t know ex ante the persistence level, so we need to vary the persistence level from some low values to some high values, e.g., from 0.1 to 0.98. Second, generally the
We can permute combinations of the persistence level and
1 The initial value of
Variance bounds
Another issue is that we want to ensure that in the estimation, condition variance does not blow up to infinity or becomes zero. Hence, we need to construct bounds for conditional variances during the GARCH(1,1) parameter estimation process.
To do this, we can calculate loose lower and upper bounds for each observation. Specifically, we can use sample variance of the residuals to compute global lower and upper bounds. We then use EWMA to compute the conditional variance for each time point. The EWMA variances are then adjusted to ensure they are within global bounds. Lastly, we scale the adjusted EWMA variances to form the variance bounds at each time.
During the estimation process, whenever we compute the conditional variances based on the prevailing model parameters, we ensure that they are adjusted to be reasonably within the bounds at each time.
See frds.algorithms.GARCHModel for the full implementation of the above improvements.