Mingze Gao

Compute Jackknife Coefficient Estimates in SAS

| 3 min read

In certain scenarios, we want to estimate a model's parameters on the sample for each observation with itself excluded. This can be achieved by estimating the model repeatedly on the leave-one-out samples but is very inefficient. If we estimate the model on the full sample, however, the coefficient estimates will certainly be biased. Thankfully, we have the Jackknife method to correct for the bias, which produces the Jackknifed coefficient estimates for each observation.

Variable Definition

Let's start with some variable definitions to help with the explanation.

b(i)b(i)the parameter estimates after deleting the iith observation
s2(i)s^2(i)the variance estimate after deleting the iith observation
X(i)X(i)the XX matrix without the iith observation
y^(i)\hat{y}(i)the iith value predicted without using the iith observation
ri=yiy^ir_i = y_i - \hat{y}_ithe iith residual
hi=xi(XX)1xih_i = x_i(X'X)^{-1}x_i'the iith diagonal of the projection matrix for the predictor space, also called the hat matrix
RStudent=ris(i)1hiRStudent =\frac{r_i}{s(i) \sqrt{1-h_i}}studentized residual
(XX)jj(X'X)_{jj}the (j,j)(j,j)th element of (XX)1(X'X)^{-1}
DFBetaj=bjb(i)js(i)(XX)jjDFBeta_j = \frac{b_{j} - b_{(i)j}}{s(i)\sqrt{(X'X)_{jj}}}the scaled measures of the change in the jjth parameter estimate calculated by deleting the iith observation


Compute the coefficient estiamtes with the iith observation excluded from the sample, i.e. b(i)b(i), or the Jackknifed coefficient estimate.


From the table above, we can get that the jjth Jackknifed coefficient estimate b(i)jb_{(i)j} without using the iith observation is:

b(i)j=bjDFBetaj×s(i)(XX)jjb_{(i)j} = b_j - DFBeta_j \times s(i) \sqrt{(X'X)_{jj}}


b(i)j=bjDFBetaj×riRStudent×1hi(XX)jjb_{(i)j} = b_j - DFBeta_j \times \frac{r_i}{RStudent\times \sqrt{1-h_i}} \sqrt{(X'X)_{jj}}

The good thing is that PROC REG produces the coefficient estimate bjb_j for j=1,2,...Kj=1,2,...K, where KK is the number of coefficients, and the INFLUENCE and I options produce the remaining statistics just enough to compute b(i)b(i):

VariableOption in PROC REG or MODEL statementName in output
bjb_jOutest= option in PROC REG<jthVariable>
rir_iOutputStatistics= from INFLUENCE option in MODEL statementResidual
RStudentRStudentOutputStatistics= from INFLUENCE option in MODEL statementRStudent
hih_iOutputStatistics= from INFLUENCE option in MODEL statementHatDiagnol
DFBetajDFBeta_jOutputStatistics= from INFLUENCE option in MODEL statementDFB_<jthVariable>
(XX)jj(X'X)_{jj}InvXPX= from I option in MODEL statement<jthVariable>


Discretionary accruals

Suppose we want to calculate the firm-level discretionary accruals for each year using the Jones (1991) model and Kothari et al (2005) model. For a firm ii, we need to first estimate the model for the industry-year excluding firm ii, then use the coefficient estimates to generate predicted accruals for firm ii. The firm's discretionary accruals is the actual accruals minus the predicted accruals.

Below is an example PROC REG that produces three datasets named work.params, work.outstats and work.xpxinv, which contain sufficient statistics to compute the Jackknifed estimates and thus the predicted accruals.

ods listing close;
proc reg data=work.funda edf outest=work.params;
  /* industry-year regression */
  by fyear sic2;
  /* id is necessary for matching Jackknifed coefficients to firm-year */
  id key;
  /* Jones Model */
  Jones: model tac = inv_at_l drev ppe / noint influence i;
  /* Kothari Model with ROA */
  Kothari: model tac = inv_at_l drevadj ppe roa / noint influence i;
  ods output OutputStatistics=work.outstats InvXPX=work.xpxinv;
ods listing;

Full SAS program for estimating 5 different measures of discretionary accruals: