Mingze Gao

Firm Historical Headquarter State from SEC 10K/Q Filings

| Last updated on | 5 min read

Why the need to use SEC filings?

In the Compustat database, a firm's headquarter state (and other identification) is in fact the current record stored in comp.company. This means once a firm relocates (or updates its incorporate state, address, etc.), all historical observations will be updated and not recording historical state information anymore.

To resolve this issue, an effective way is to use the firm's historical SEC filings. You can follow my previous post Textual Analysis on SEC filings to extract the header information, which includes a wide range of meta data. Alternatively, the University of Notre Dame's Software Repository for Accounting and Finance provides an augmented 10-X header dataset.

Do I have to use SEC filings?

I'll skip the parsing procedure for now. The most important point is that using the historical SEC filings, you can ensure that you truly are using the historical headquarter state in your empirical estimation. Based on the augmented 10-X header dataset, I find that around 2-3% of Compustat firms changed their headquarter state (as indicated by their business address) each year.

Firms Changed StateTotal Firms% Firms Changed State

Moreover, 2,947 out of the 17,221 firms, or about 17% firms changed their headquarter state in the merged sample. This is by no means a small number that can be ignored. So, whenever possible, you should try to use the historical information from past SEC filings' metadata.

How to get the actual historical firm HQ state using SEC filings?

1969 - 2003

I start with the firm historical HQ state provided by Bai, Fairhurst and Serfling (2020 RFS). This dataset contains the historical HQ locations from 1969 to 2003, which is based on the SEC filings post 1994 and hand-collected by the authors from the Moody’s Manuals (later Mergent Manuals) and Dun & Bradstreet’s Million Dollar Directory (later bought by Mergent).1

1994 - 2018

To extend the dataset, I download the augmented 10-X header dataset and use the following Python script to extract the business address (state) filed.

import pandas as pd

filepath = "~/Downloads/LM_EDGAR_10X_Header_1994_2018.csv"

if __name__ == "__main__":

    df = pd.read_csv(
        usecols=["cik", "file_date", "ba_state"],
        dtype={"cik": str},
    # Some `ba_stata` codes are lowercase
    df["ba_state"] = df["ba_state"].str.upper()
    # Some  `ba_state` codes are not valid US states
    df = df[df["ba_state"].str.isalpha() & ~pd.isnull(df["ba_state"])]
        convert_dates={"file_date": "td"},

The result is a historical_state.dta Stata file like this:


1969 - 2018 merged

Finally, to merge the two datasets together, I imported them into WRDS Cloud and run the following SAS script:

libname hs "~/historical_state";

/* Historical HQ state (1994 to 2018) from augmented 10-X header dataset */
proc import datafile="~/historical_state/historical_state_1994_2018.dta"
	out=historical_state_1994_2018 dbms=stata replace;
/* Historical HQ state (1969 to 2003) from Bai, Fairhurst and Serfling (2020 RFS) */
proc import datafile="~/historical_state/hist_headquarters_Bai_et_al.dta"
	out=hist_headquarters_Bai_et_al dbms=stata replace;

/* Build the post-1994 dataset using SEC filings */
proc sql;
create table funda as
select gvkey, cik, datadate, fyear from comp.funda
where indfmt= 'INDL' and datafmt='STD' and popsrc='D' and consol='C'
and year(datadate) between 1994 and 2018
/* "For firms that change fiscal year within a calendar year,
	we take the last reported date when extracting financial data.
	This leaves us with one set of observations for each firm (gvkey) in each year."
	-- Pelueger, Siriwardane and Sunderam (2020 QJE) */
group by gvkey, fyear having datadate=max(datadate);

create table firm_historical_state as
select a.*, b.ba_state as state_sec label="State from SEC filings"
from funda as a left join historical_state as b
on a.cik=b.cik and year(a.datadate)=year(b.file_date) and b.file_date<=a.datadate
group by a.gvkey, a.datadate
/* use the SEC filing closet to and before the Compustat datadate */
having b.file_date=max(b.file_date);

create table historical_state_1994_2018 as
select a.*, b.state as state_comp label="State from Compustat"
from firm_historical_state as a left join comp.company as b
on a.gvkey=b.gvkey
order by a.gvkey, a.datadate;

/* Sanity check: no duplicated gvkey-fyear */
proc sort data=historical_state_1994_2018 nodupkey; by gvkey datadate; run;

proc sql;
create table hist_headquarters_Bai_et_al as
select put(gvkeyn, z6.) as gvkey, fyear, state
from hist_headquarters_Bai_et_al;

/* Stack together the two datasets */
data states;
set hist_headquarters_Bai_et_al
	historical_state_1994_2018(where=(fyear>2003) keep=gvkey fyear state:);

proc sql;
create table hs.corrected_hist_state_1969_2018 as
select *, coalesce(state, state_sec, state_comp) as corrected_state
from states where not missing(calculated corrected_state)
order by gvkey, fyear;

/* Sanity check: no duplicated gvkey-fyear */
proc sort data=hs.corrected_hist_state_1969_2018 nodupkey; by gvkey fyear; run;

Data available for download

Download the data I compiled here: corrected_hist_state_1969_2018.dta.zip (1MB).


If you use the code/data above, please consider citing the following article for which it was written/constructed.

Gao, M. Leung, H. and Qiu, B. (2021). Organization Capital and Executive Performance Incentives, Journal of Banking & Finance, 123, 106017.

Also, if you have any suggestion to further improve the identification of historical firm HQ location, please do drop a comment. We will all benefit from it.


  1. The authors note that "for our final sample of 115,432 firm-year observations, we find that over the 1969 to 2003 period, 9,847 (87.50%) never relocate, 1,211 (10.76%) relocate once, 178 (1.58%) relocate twice, and 18 (0.16%) relocate three times."