Skip to content

Variance Ratio Test - Lo and MacKinlay (1988)

A simple test for the random walk hypothesis of prices and efficient market.

Definition

Let's assume:

  • a price series \{p_t\}=\{p_0,p_1,p_2,...,p_T\} and
  • a log return series \{x_t\}=\{x_1, x_2, ..., x_T\} where x_t=\ln \frac{p_t}{p_{t-1}}

Variance Ratio (VR)

The variance ratio of k-period return is defined as:

\textit{V}(k)=\frac{\textit{Var}(x_t+x_{t-1}+...+x_{t-k+1})/k}{\textit{Var}(x_t)}

The estimator of \textit{V}(k) proposed in Lo and MacKinlay (1988) is

\textit{VR}(k)=\frac{\hat\sigma^2(k)}{\hat\sigma^2(1)}

where \hat\sigma^2(1) is the unbiased estimator of the one-period return variance, using the one-period returns \{x_t\}, and is defined as

\hat\sigma^2(1)=\frac{1}{T-1} \sum_{t-1}^T (x_t - \hat\mu)^2

and \hat\sigma^2(k) is the estimator of k-period return variance using k-period returns. Lo and MacKinlay (1988) defined it, due to limited sample size and the desire to improve the power of the test, as

\hat\sigma^2(k)=\frac{1}{m} \sum_{t-1}^T \left(\ln\frac{P_t}{P_{t-k}} - k\hat\mu \right)^2

where m=k(T-k+1)(1-k/T) is chosen such that \hat\sigma^2(k) is an unbiased estimator of the k-period return variance when \sigma^2_t is constant over time.

Variance Ratio Test Statistics

Lo and MacKinlay (1988) proposed that under the null hypothesis of V(k)=1, the test statistic is given by

Z(k)=\frac{\textit{VR}(k)-1}{\sqrt{\phi(k)}}

which follows the standard normal distribution asymptotically.

Homoscedasticity

Under the assumption of homoscedasticity, the asymptotic variance \phi is given by

\phi(k)=\frac{2(2k-1)(k-1)}{3kT}

Heteroscedasticity

Under the assumption of heteroscedasticity, the asymptotic variance \phi is given by

\phi(k)=\sum_{j=1}^{k-1} \left[\frac{2(k-j)}{k} \right]^2\delta(j)
\delta(j)=\frac{\sum_{t=j+1}^T (x_t - \hat\mu)^2(x_{t-j} - \hat\mu)^2}{\left[\sum_{t=1}^T (x_t - \hat\mu)^2\right]^2}

Erratum

Note that there's a missing T in the numerator of \delta(j) above. It is actually missing the 1988 RFS paper and the 1998 JE'mtric paper, but has been corrected in the 1990 RFS Issue 1: https://doi.org/10.1093/rfs/3.1.ii. The corrected version reads:

\delta(j)=\frac{T\sum_{t=j+1}^T (x_t - \hat\mu)^2(x_{t-j} - \hat\mu)^2}{\left[\sum_{t=1}^T (x_t - \hat\mu)^2\right]^2}

To correct it in the example code below, change the highlighted line 51 to:

1
delta_arr = T * b_arr / np.square(np.sum(sqr_demeaned_x))

I thank Simon Jurkatis for letting me know about the erratum.

Source Code

MIT MIT

This example Python code has been optimized for speed but serves only demonstration purpose. It may contain errors.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# LoMacKinlay.py
import numpy as np
from numba import jit

name = 'LoMacKinlay1988'
description = 'Variance ratio and test statistics as in Lo and MacKinlay (1988)'
vars_needed = ['Price']


@jit(nopython=True, nogil=True, cache=True)
def _estimate(log_prices, k, const_arr):
    # Log returns = [x2, x3, x4, ..., xT], where x(i)=ln[p(i)/p(i-1)]
    rets = np.diff(log_prices)
    # T is the length of return series
    T = len(rets)
    # mu is the mean log return
    mu = np.mean(rets)
    # sqr_demeaned_x is the array of squared demeaned log returns
    sqr_demeaned_x = np.square(rets - mu)
    # Var(1)
    # Didn't use np.var(rets, ddof=1) because
    # sqr_demeaned_x is calculated already and will be used many times.
    var_1 = np.sum(sqr_demeaned_x) / (T-1)
    # Var(k)
    # Variance of log returns where x(i) = ln[p(i)/p(i-k)]
    # Before np.roll() - array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    # After np.roll(,shift=2) - array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
    # Discard the first k elements.
    rets_k = (log_prices - np.roll(log_prices, k))[k:]
    m = k * (T - k + 1) * (1 - k / T)
    var_k = 1/m * np.sum(np.square(rets_k - k * mu))

    # Variance Ratio
    vr = var_k / var_1

    # a_arr is an array of { (2*(k-j)/k)^2 } for j=1,2,...,k-1, fixed for a given k:
    #   When k=5, a_arr = array([2.56, 1.44, 0.64, 0.16]).
    #   When k=8, a_arr = array([3.0625, 2.25, 1.5625, 1., 0.5625, 0.25, 0.0625])
    # Without JIT it's defined as:
    #   a_arr = np.square(np.arange(k-1, 0, step=-1, dtype=np.int) * 2 / k)
    # But np.array creation is not allowed in nopython mode.
    # So const_arr=np.arange(k-1, 0, step=-1, dtype=np.int) is created outside.
    a_arr = np.square(const_arr * 2 / k)

    # b_arr is part of the delta_arr.
    b_arr = np.empty(k-1, dtype=np.float64)
    for j in range(1, k):
        b_arr[j-1] = np.sum((sqr_demeaned_x *
                             np.roll(sqr_demeaned_x, j))[j+1:])

    delta_arr = b_arr / np.square(np.sum(sqr_demeaned_x))

    # Both arrarys are of length (k-1)
    assert len(delta_arr) == len(a_arr) == k-1

    phi1 = 2 * (2*k - 1) * (k-1) / (3*k*T)
    phi2 = np.sum(a_arr * delta_arr)

    # VR test statistics under two assumptions
    vr_stat_homoscedasticity = (vr - 1) / np.sqrt(phi1)
    vr_stat_heteroscedasticity = (vr - 1) / np.sqrt(phi2)

    return vr, vr_stat_homoscedasticity, vr_stat_heteroscedasticity


def estimate(data):
    "A fast estimation of Variance Ratio test statistics as in Lo and MacKinlay (1988)"
    # Prices array = [p1, p2, p3, p4, ..., pT]
    prices = data['Price'].to_numpy(dtype=np.float64)
    result = []
    # Estimate many lags.
    for k in [2, 4, 6, 8, 10, 15, 20, 30, 40, 50, 100, 200, 500, 1000]:
        # Compute a constant array as np.array creation is not allowed in nopython mode.
        const_arr = np.arange(k-1, 0, step=-1, dtype=np.int)
        vr, stat1, stat2 = _estimate(np.log(prices), k, const_arr)
        result.append({
            f'Variance Ratio (k={k})': vr,
            f'Variance Ratio Test Statistic (k={k}) Homoscedasticity Assumption': stat1,
            f'Variance Ratio Test Statistic (k={k}) Heteroscedasticity Assumption': stat2
        })
    return result

As an example, let's create 1 million prices from random walk and estimate the variance ratio and two test statistics at various lags.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
if __name__ == "__main__":

    import pandas as pd
    from pprint import pprint
    np.random.seed(1)
    # Generate random steps with mean=0 and standard deviation=1
    steps = np.random.normal(0, 1, size=1000000)
    # Set first element to 0 so that the first price will be the starting stock price
    steps[0] = 0
    # Simulate stock prices, P with a large starting price
    P = 10000 + np.cumsum(steps)
    # Test
    data = pd.DataFrame(P, columns=['Price'])
    result = estimate(data)
    pprint(result)

In just a few seconds, the output is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[{'Variance Ratio (k=2)': 1.0003293867428105,
  'Variance Ratio Test Statistic (k=2) Heteroscedasticity Assumption': 0.3290463403922243,
  'Variance Ratio Test Statistic (k=2) Homoscedasticity Assumption': 0.32938657811705435},
 {'Variance Ratio (k=4)': 1.0007984480057006,
  'Variance Ratio Test Statistic (k=4) Heteroscedasticity Assumption': 0.4259533413884602,
  'Variance Ratio Test Statistic (k=4) Homoscedasticity Assumption': 0.4267881978178301},
 {'Variance Ratio (k=6)': 0.9999130202975425,
  'Variance Ratio Test Statistic (k=6) Heteroscedasticity Assumption': -0.035117568315004344,
  'Variance Ratio Test Statistic (k=6) Homoscedasticity Assumption': -0.03518500446785826},
 {'Variance Ratio (k=8)': 1.0001094011344318,
  'Variance Ratio Test Statistic (k=8) Heteroscedasticity Assumption': 0.036922688136577515,
  'Variance Ratio Test Statistic (k=8) Homoscedasticity Assumption': 0.03698431520269611},
 {'Variance Ratio (k=10)': 1.000702410129927,
  'Variance Ratio Test Statistic (k=10) Heteroscedasticity Assumption': 0.20772743120012313,
  'Variance Ratio Test Statistic (k=10) Homoscedasticity Assumption': 0.20803582207641647},
 {'Variance Ratio (k=15)': 1.0022173139633856,
  'Variance Ratio Test Statistic (k=15) Heteroscedasticity Assumption': 0.5213067838911684,
  'Variance Ratio Test Statistic (k=15) Homoscedasticity Assumption': 0.5219816274021579},
 {'Variance Ratio (k=20)': 1.0038048661705044,
  'Variance Ratio Test Statistic (k=20) Heteroscedasticity Assumption': 0.7646395131154204,
  'Variance Ratio Test Statistic (k=20) Homoscedasticity Assumption': 0.7655801985571125},
 {'Variance Ratio (k=30)': 1.0054447472916035,
  'Variance Ratio Test Statistic (k=30) Heteroscedasticity Assumption': 0.8819250061384853,
  'Variance Ratio Test Statistic (k=30) Homoscedasticity Assumption': 0.8829960534692654},
 {'Variance Ratio (k=40)': 1.0073830253022766,
  'Variance Ratio Test Statistic (k=40) Heteroscedasticity Assumption': 1.0290213306735625,
  'Variance Ratio Test Statistic (k=40) Homoscedasticity Assumption': 1.0303005120740392},
 {'Variance Ratio (k=50)': 1.0086502431826903,
  'Variance Ratio Test Statistic (k=50) Heteroscedasticity Assumption': 1.0741837462564026,
  'Variance Ratio Test Statistic (k=50) Homoscedasticity Assumption': 1.0755809312730416},
 {'Variance Ratio (k=100)': 1.0153961901671604,
  'Variance Ratio Test Statistic (k=100) Heteroscedasticity Assumption': 1.3415119471043384,
  'Variance Ratio Test Statistic (k=100) Homoscedasticity Assumption': 1.3434284573260773},
 {'Variance Ratio (k=200)': 1.0157046541161026,
  'Variance Ratio Test Statistic (k=200) Heteroscedasticity Assumption': 0.9639233626580027,
  'Variance Ratio Test Statistic (k=200) Homoscedasticity Assumption': 0.9653299929052963},
 {'Variance Ratio (k=500)': 1.0182166207668526,
  'Variance Ratio Test Statistic (k=500) Heteroscedasticity Assumption': 0.7055681216511915,
  'Variance Ratio Test Statistic (k=500) Homoscedasticity Assumption': 0.7065863036900429},
 {'Variance Ratio (k=1000)': 1.0187822241562863,
  'Variance Ratio Test Statistic (k=1000) Heteroscedasticity Assumption': 0.5140698821944161,
  'Variance Ratio Test Statistic (k=1000) Homoscedasticity Assumption': 0.5147582201029065}]

It's easy to see that at all lags tested, we cannot reject the null hypothesis that this price series follows a random walk.

For comparison purpose, below is an implementation in pure Python. It is more readable but is significantly slower.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def estimate_python(data, k=5):
    "A slow pure python implementation"
    prices = data['Price'].to_numpy(dtype=np.float64)
    log_prices = np.log(prices)
    rets = np.diff(log_prices)
    T = len(rets)
    mu = np.mean(rets)
    var_1 = np.var(rets, ddof=1, dtype=np.float64)
    rets_k = (log_prices - np.roll(log_prices, k))[k:]
    m = k * (T - k + 1) * (1 - k / T)
    var_k = 1/m * np.sum(np.square(rets_k - k * mu))

    # Variance Ratio
    vr = var_k / var_1
    # Phi1
    phi1 = 2 * (2*k - 1) * (k-1) / (3*k*T)
    # Phi2

    def delta(j):
        res = 0
        for t in range(j+1, T+1):
            t -= 1  # array index is t-1 for t-th element
            res += np.square((rets[t]-mu)*(rets[t-j]-mu))
        return res / ((T-1) * var_1)**2

    phi2 = 0
    for j in range(1, k):
        phi2 += (2*(k-j)/k)**2 * delta(j)

    return vr, (vr - 1) / np.sqrt(phi1), (vr - 1) / np.sqrt(phi2)

Last update: March 3, 2021

Comments