Merge Compustat and CRSP
Using the CRSP/Compustat Merged Database (CCM) to extract data is one of the fundamental steps in most finance studies. Here I document several SAS programs for annual, quarterly and monthly data, inspired by and adapted from several examples from the WRDS.1
GVKEY-PERMNO link table
First, we need to create a
GVKEY-PERMNO link table.
%let beg_yr = 2000; %let end_yr = 2003; proc sql; create table lnk as select * from crsp.ccmxpf_lnkhist where /* See below for a description of the link types */ linktype in ("LU", "LC") and /* Extend the period to deal with fiscal year issues */ /* Note that the ".B" and ".E" missing value codes represent the */ /* earliest possible beginning date and latest possible end date */ /* of the Link Date range, respectively. */ (&end_yr+1 >= year(linkdt) or linkdt = .B) and (&beg_yr-1 <= year(linkenddt) or linkenddt = .E) /* primary link assigned by Compustat or CRSP */ and linkprim in ("P", "C") order by gvkey, linkdt; quit;
|LC||Link research complete (after extensive research by CRSP). Standard connection between databases.|
|LU||Link is unresearched by CRSP. It is established by comparing the Compustat and historical CRSP CUSIPs. LU represents the most popular link type.|
|LS||Link valid for this security only.2|
|LX||Link to a security that trades on foreign exchange not included in CRSP data.|
|LD||Duplicate link to a security. Two GVKEYs map to a single |
|LN||Primary link exists but Compustat does not have prices.3|
|NR||No link available; confirmed by research.|
|NU||No link available; not yet confirmed.|
According to WRDS's support page:
- Primary link types (
LS) account for 41% of the links in CCM.
- Secondary link types (
LN) account for only 2%.
- Non-matching link types (
NU) account for the rest 57%, which is expected because of the different coverage of the two databases.
LU should be sufficient.
Compustat Annual and CRSP
proc sql; create table mydata as select * from lnk, comp.funda(keep=gvkey fyear datadate indfmt datafmt popsrc consol sale) as cst where indfmt= 'INDL' and datafmt='STD' and popsrc='D' and consol='C' and lnk.gvkey = cst.gvkey and (&beg_yr <= fyear <= &end_yr) and (linkdt <= cst.datadate or linkdt = .B) and (cst.datadate <= linkenddt or linkenddt = .E); quit;
Compustat Quarterly and CRSP
proc sql; create table mydata as select * from lnk, comp.fundq (keep=gvkey fyearq datadate indfmt datafmt popsrc consol saley saleq) as cst where indfmt= 'INDL' and datafmt='STD' and popsrc='D' and consol='C' and lnk.gvkey = cst.gvkey and (&beg_yr <= fyearq <= &end_yr) and (linkdt <= cst.datadate or linkdt = .B) and (cst.datadate <= linkenddt or linkenddt = .E); quit;
Compustat Monthly and CRSP
To be done.
WRDS Overview of CRSP/COMPUSTAT Merged, Use CRSP-Compustat Merged Table to Add Permno to Compustat Data, and Merging CRSP and Compustat Data. ↩
PERMNOswith the same
PERMCOwill link to other
LSlinks mainly relate to ETFs where a single CRSP
PERMCOlinks to multiple Compustat
GVKEYs. In Compustat, even though they may belong to the same investment company (e.g. ISHARES), ETFs are presented with different
GVKEYsand CRSP flags this situation. ↩
Prices are used to check the accuracy of the link. For linktype LN there is no price information available even on a quarterly or annual basis. The user will have to decide whether or not to include these links. ↩