Research Notes¶

December 21, 2023
in Research Notes
2 min read

Identify Retail Investors

Retail investors and their trading behaviour attract many research interests. One strand of literature uses proprietary datasets to identify retail investors. The other uses algorithms. A recent JF paper Boehmer et al. (2021) proposes a simple one based only on the trade price, which also signs the trade direction effectively. Even more interestingly, I just read a follow-up work forthcoming on JF by Barber et al. (2023). The authors placed 85,000 retail trades themselves to validate the Boehmer et al. (2021) algorithm.

October 15, 2023
in Teaching Notes, Research Notes
11 min read

Translog Cost Function Estimation

This post focuses on the translog cost function. I discuss the linear homogeneity constraint, the technique to impose the constraint, and its estimation via

Ordinary Least Square (OLS)
Stochastic Frontier Analysis (SFA)

Code examples are provided, too.

October 13, 2023
in Teaching Notes, Research Notes
10 min read

Translog Production and Cost Functions

In this post, I'll carefully explain the derivation of cost function from a CES production function, as well as the derivation of translog (transcendental logarithmic) production and cost functions.

flowchart TB
    subgraph Production
    A[Production Function] -. approximation .-> D(Translog Production Function)
    end
    subgraph Cost
    B[Cost Function] -. approximation .-> C(Translog Cost Function)
    end
    A == Conversion via Duality ==> B

Before I start, the graph above illustrate the relations. Specifically, we can derive the cost function from a CES production function via the duality theorem. Translog production and translog cost functions are approximations to the production and corresponding cost function, respectively, via Taylor expansion.

September 8, 2023
in Research Notes, Programming
3 min read

Download SEC Filings from EDGAR

This post documents how to download SEC filings from EDGAR using edgar-analyzer, a Python program I wrote. It features:

3 commands only to download any type of filings for any period of time
auto throttling of download speed to adhere to the SEC policy of fair use

August 17, 2023
in Research Notes
13 min read

Difference-in-Differences Estimation

Empirical researchers have been using difference-in-differences (DiD) estimation to identify an event's Average Treatment effect on the Treated entities (ATT). This post is my understanding and a non-technical note of the DiD approach as it evolves over the past years, especially on the problems and solutions when multiple treatment events are staggered.

August 15, 2023
in Research Notes
2 min read

CRSP Missing Codes

A note on the missing codes in CRSP.

August 8, 2023
in Research Notes
4 min read

FRED - Federal Reserve Economic Data

Since Stata 15, we can search, browse and import almost a million U.S. and international economic and financial time series made available by the St. Louis Federal Reserve's Federal Research Economic Data. This post briefly explains this great feature.

April 9, 2023
in Research Notes
4 min read

Correlated Random Effects

Can we estimate the coefficient of gender while controlling for individual fixed effects? This sounds impossible as an individual's gender typically does not vary and hence would be absorbed by individual fixed effects. However, Correlated Random Effects (CRE) may actually help.

At last year's FMA Annual Meeting, I learned this CRE estimation technique when discussing a paper titled "Gender Gap in Returns to Publications" by Piotr Spiewanowski, Ivan Stetsyuk and Oleksandr Talavera. Let me recollect my memory and summarize the technique in this post.

April 5, 2023
in Research Notes
8 min read

Probability of Informed Trading (PIN)

pin-binance

November 30, 2022
in Research Notes
4 min read

Estimate Merton Distance-to-Default

Merton (1974) Distance to Default (DD) model is useful in forecasting defaults. This post documents a few ways to empirically estimate Merton DD (and default probability) as in Bharath and Shumway (2008 RFS).

March 6, 2022
in Research Notes
8 min read

Adding Another Factor to Principal-Agent Model

In a traditional principal-agent model, firm output is a function of the agent's effort and the principal observes only the output not agent's effort. The principal carefully designs the agent's compensation package, especially the sensitivity of the agent's pay to firm output, to maximize the firm value. Now, what if we add another factor to the relationship between firm output and agent's effort? How would the optimal pay sensitivity change?

January 9, 2022
in Research Notes
5 min read

Estimate Organization Capital

As in Eisfeldt and Papanikolaou (2013), we obtain firm-year accounting data from the Compustat and compute the stock of organization capital for firms using the perpetual inventory method that recursively calculates the stock of OC by accumulating the deflated value of SG&A expenses.

May 15, 2021
in Research Notes
4 min read

Download M&A Deals from SDC Platinum

Thomson One Banker SDC Platinum database provides comprehensive M&A transaction data from early 1980s, and is perhaps the most widely used M&A database in the world.

This post documents the steps of downloading M&A deals from the SDC Platinum database. Specifically, I show how to download the complete M&A data where:

both the acquiror and the target are US firms,
the acquiror is a public firm or a private firm,
the target is a public firm, a private firm, or a subsidiary,
the deal value is at least $1m, and
the form of the deal is a acquisition, a merger or an acquisition of majority interest.

February 11, 2021
in Research Notes
5 min read

Specification Curve Analysis

Motivation

More often than not, empirical researchers need to argue that their chosen model specification reigns. If not, they need to run a battery of tests on alternative specifications and report them. The problem is, researchers can fit a few tables each with a few models in the paper at best, and it's extremely hard for readers to know whether the reported results are being cherry-picked.

So, why not run all possible model specifications and find a concise way to report them all?

July 28, 2020
in Research Notes
6 min read

Firm Historical Headquarter State from SEC 10K/Q Filings

Why the need to use SEC filings?

In the Compustat database, a firm's headquarter state (and other identification) is in fact the current record stored in comp.company. This means once a firm relocates (or updates its incorporate state, address, etc.), all historical observations will be updated and not recording historical state information anymore.

To resolve this issue, an effective way is to use the firm's historical SEC filings. You can follow my previous post Textual Analysis on SEC filings to extract the header information, which includes a wide range of meta data. Alternatively, the University of Notre Dame's Software Repository for Accounting and Finance provides an augmented 10-X header dataset.

2023 March Update

In this update I use 1,491,368 8-K filings of U.S. firms from 2004 to Dec 2022 and extract their HQ state and zipcode. hist_state_zipcode_from_8k_2004_2022.csv.zip

June 10, 2020
in Research Notes
3 min read

Compute Jackknife Coefficient Estimates in SAS

In certain scenarios, we want to estimate a model's parameters on the sample for each observation with itself excluded. This can be achieved by estimating the model repeatedly on the leave-one-out samples but is very inefficient. If we estimate the model on the full sample, however, the coefficient estimates will certainly be biased. Thankfully, we have the Jackknife method to correct for the bias, which produces the Jackknifed coefficient estimates for each observation.

May 27, 2020
in Research Notes
9 min read

Textual Analysis on SEC Filings

Nowadays top journals favour more granular studies. Sometimes it's useful to dig into the raw SEC filings and perform textual analysis. This note documents how I download all historical SEC filings via EDGAR and conduct some textual analyses.

May 25, 2020
in Research Notes
2 min read

Bank Holing Company Financials from FR Y-9C

A SAS macro used to extract BHC data.

May 25, 2020
in Research Notes
5 min read

Merge Compustat and CRSP

Using the CRSP/Compustat Merged Database (CCM) to extract data is one of the fundamental steps in most finance studies. Here I document several SAS programs for annual, quarterly and monthly data, inspired by and adapted from several examples from the WRDS.¹

May 24, 2020
in Research Notes
5 min read

Decomposing Herfindahl–Hirschman (HHI) Index

Herfindahl–Hirschman (HHI) Index is a well-known market concentration measure determined by two factors:

the size distribution (variance) of firms, and
the number of firms.

Intuitively, having a hundred similar-sized gas stations in town means a far less concentrated environment than just one or two available, and when the number of firms is constant, their size distribution or variance determines the magnitude of market concentration.

Since these two properties jointly determine the HHI measure of concentration, naturally we want a decomposition of HHI that can reflects these two dimensions respectively. This is particularly useful when two distinct markets have the same level of HHI measure, but the concentration may result from different sources. Note that here these two markets do not necessarily have to be industry A versus industry B, but can be the same industry niche in two geographical areas, for example.

Thus, we can think of HHI as the sum of the actual market state's deviation from 1) all firms having the same size, and the deviation from 2) a fully competitive environment with infinite number of firms in the market. Some simple math can solve our problem.

May 22, 2020
in Research Notes
7 min read

Compute Weekly Return from Daily CRSP Data

Computing the weekly returns from the CRSP daily stock data is a common task but may be tricky sometimes. Let's discuss a few different ways to get it done incorrectly and correctly.

TL;DR Take me to the final solution!

Surely -> The solution

May 22, 2020
in Research Notes
46 min read

Generate Fama-French Industry Classification From SIC

This Stata program creates the Fama-French industry classification from SIC code.

May 22, 2020
in Research Notes
1 min read

Identify Chinese State-Owned Enterprise using CSMAR

Many research papers on Chinese firms include a control variable that indicates if the firm is a state-owned enterprise (SOE). This is important as SOEs and non-SOEs differ in many aspects and may have structural differences. This post documents the way to construct this indicator variable from the CSMAR databases.

May 22, 2020
in Research Notes
3 min read

Use SAS Macros on WRDS

The Wharton Research Data Services (WRDS) provides quite a handful of SAS macros that can be used directly. This article explains how to use those handy macros on WRDS when you use remote submission to run your code on the WRDS cloud. Lastly, it explains how to load and use third-party SAS macros from a URL.

May 22, 2020
in Research Notes
9 min read

What it takes to be a CEO? A fun survey of literature

Taking up the position of CEO means more than pressure from the board and investors. You’ll also face heavy scrutiny from academia. Whether or not a firm’s hiring and compensation committees use them as a reference, here are some of the findings that you may want to be aware of.