Mingze Gao

Probability of Informed Trading (PIN)

| 7 min read
Tags:
Translation is available:

In the market microstructure literature, Easley et. al. (1996) proposed a trading model that can decompose the bid-ask spread. The most commendable aspect of this model is the introduction of the "Probability of Informed Trading," or PIN, which serves as a means of measuring the informational component in the spread. As the name suggests, under ideal conditions, PIN can reflect the probability of informed trading in a market with market maker. In this article, I attempt to comb through the modeling process in the Easley et. al. (1996) paper and discuss how to handle the objective function in maximum likelihood estimation to avoid overflow errors during computation.

Table of Contents

Model

Assume that the buy and sell orders of informed and uninformed traders follow independent Poisson processes, and the following tree diagram describes the entire trading process:

theoretical-model-of-pin

Trading Process

Next, assume that the market maker is a Bayesian, that is, he will update his understanding of the overall market status, especially whether there is new information on that day, by observing trades and trading rates. Suppose each trading day is independent, P(t)=(Pn(t),Pb(t),Pg(t))P(t)=(P_n(t), P_b(t), P_g(t)) is the market maker's prior probability perception, where nn represents no new information, bb represents bearish bad news, and gg represents bullish good news, so P(t)=(1βˆ’Ξ±,Ξ±Ξ΄,Ξ±(1βˆ’Ξ΄))P(t)=(1-\alpha, \alpha\delta, \alpha(1-\delta)).

Let StS_t be the event of a sell order arriving at time tt, and BtB_t be the event of a buy order arriving at time tt. Also, let P(t∣St)P(t|S_t) be the updated probability perception of the market maker after observing a sell order arriving at time tt based on the existing information. Then, according to Bayes' theorem, if there is no new information at time tt and the market maker observes a sell order, the posterior probability Pn(t∣St)P_n(t|S_t) should be:

Pn(t∣St)=Pn(t)ΡΡ+Pb(t)μ\begin{equation} P_n(t|S_t)=\frac{P_n(t)\varepsilon}{\varepsilon+P_b(t)\mu}\end{equation}

Similarly, if there is bearish information and the market maker observes a sell order at time tt, the posterior probability Pb(t∣St)P_b(t|S_t) should be:

Pb(t∣St)=Pb(t)(Ρ+μ)Ρ+Pb(t)μ\begin{equation} P_b(t|S_t)=\frac{P_b(t)(\varepsilon+\mu)}{\varepsilon+P_b(t)\mu}\end{equation}

If there is bullish information and the market maker observes a sell order at time tt, the posterior probability Pg(t∣St)P_g(t|S_t) should be:

Pg(t∣St)=Pg(t)ΡΡ+Pb(t)μ\begin{equation} P_g(t|S_t)=\frac{P_g(t)\varepsilon}{\varepsilon+P_b(t)\mu} \end{equation}

Thus, the expected zero-profit bid price at time tt on day ii should be the conditional expectation of the asset value based on historical information and observing sell order at this time, that is,

b(t)=Pn(t)Ξ΅Viβˆ—+Pb(t)(Ξ΅+ΞΌ)Vβ€Ύi+Pg(t)Ξ΅Vβ€ΎiΞ΅+Pb(t)ΞΌ\begin{equation} b(t)=\frac{P_n(t)\varepsilon V^*_i+P_b(t)(\varepsilon+\mu)\underline{V}_i+P_g(t)\varepsilon\overline{V}_i}{\varepsilon+P_b(t)\mu} \end{equation}

Here, ViV_i is the value of the asset at the end of day ii, and let the asset value be Vβ€Ύi\overline{V}_i when there is positive news, Vβ€Ύi\underline{V}_i when there is negative news, and Viβˆ—V^*_i when there is no news, with Vβ€Ύi<Viβˆ—<Vβ€Ύi\underline{V}_i < V^*_i < \overline{V}_i.

At this point, the ask price should be:

a(t)=Pn(t)Ξ΅Viβˆ—+Pb(t)Ξ΅Vβ€Ύi+Pg(t)(Ξ΅+ΞΌ)Vβ€ΎiΞ΅+Pg(t)ΞΌ\begin{equation} a(t)=\frac{P_n(t)\varepsilon V^*_i+P_b(t)\varepsilon\underline{V}_i+P_g(t)(\varepsilon+\mu)\overline{V}_i}{\varepsilon+P_g(t)\mu}\end{equation}

Let's associate these bid and ask prices with the expected asset value at time tt. Considering that the conditional expectation of the asset value at this time is:

E[Vi∣t]=Pn(t)Viβˆ—+Pb(t)Vβ€Ύi+Pg(t)Vβ€Ύi\begin{equation} E[V_i|t]=P_n(t)V^*_i+P_b(t)\underline{V}_i+P_g(t)\overline{V}_i\end{equation}

we can write the above b(t)b(t) and a(t)a(t) as:

b(t)=E[Vi∣t]βˆ’ΞΌPb(t)Ξ΅+ΞΌPb(t)(E[Vi∣t]βˆ’Vβ€Ύi)\begin{equation} b(t)=E[V_i|t]-\frac{\mu P_b(t)}{\varepsilon+\mu P_b(t)}(E[V_i|t]-\underline{V}_i)\end{equation}
a(t)=E[Vi∣t]+ΞΌPg(t)Ξ΅+ΞΌPg(t)(Vβ€Ύiβˆ’E[Vi∣t])\begin{equation} a(t)=E[V_i|t]+\frac{\mu P_g(t)}{\varepsilon+\mu P_g(t)}(\overline{V}_i-E[V_i|t])\end{equation}

Thus, the bid-ask spread is a(t)βˆ’b(t)a(t)-b(t), which is:

a(t)βˆ’b(t)=ΞΌPg(t)Ξ΅+ΞΌPg(t)(Vβ€Ύiβˆ’E[Vi∣t])+ΞΌPb(t)Ξ΅+ΞΌPb(t)(E[Vi∣t]βˆ’Vβ€Ύi)\begin{equation} a(t)-b(t)=\frac{\mu P_g(t)}{\varepsilon+\mu P_g(t)}(\overline{V}_i-E[V_i|t])+\frac{\mu P_b(t)}{\varepsilon+\mu P_b(t)}(E[V_i|t]-\underline{V}_i)\end{equation}

This indicates that the bid-ask spread at time tt is actually:

The probability of a buy order being an informed trade Γ—\times the expected loss due to the informed buyer + the probability of a sell order being an informed trade Γ—\times the expected loss due to the informed seller

Therefore, the probability that any trade at time tt is based on asymmetric information from informed traders is the sum of these two probabilities:

PIN(t)=ΞΌPg(t)Ξ΅+ΞΌPg(t)+ΞΌPb(t)Ξ΅+ΞΌPb(t)=ΞΌ(1βˆ’Pn(t))ΞΌ(1βˆ’Pn(t))+2Ξ΅\begin{equation} PIN(t)=\frac{\mu P_g(t)}{\varepsilon+\mu P_g(t)}+\frac{\mu P_b(t)}{\varepsilon+\mu P_b(t)}=\frac{\mu(1-P_n(t))}{\mu(1-P_n(t))+2\varepsilon}\end{equation}

If no information event occurs (Pn(t)=1P_n(t)=1) or there are no informed trades (ΞΌ=0\mu=0), both PINPIN and the bid-ask spread should be zero. If the probabilities of positive and negative news are equal, i.e., Ξ΄=1βˆ’Ξ΄\delta=1-\delta, the bid-ask spread can be simplified to:

a(t)βˆ’b(t)=Ξ±ΞΌΞ±ΞΌ+2Ξ΅[Vβ€Ύiβˆ’Vβ€Ύi]\begin{equation} a(t)-b(t)=\frac{\alpha\mu}{\alpha\mu+2\varepsilon}[\overline{V}_i-\underline{V}_i]\end{equation}

And our PINPIN measure is simplified to:

PIN(t)=Ξ±ΞΌΞ±ΞΌ+2Ξ΅\begin{equation} PIN(t)=\frac{\alpha\mu}{\alpha\mu+2\varepsilon}\end{equation}

Model Estimation

After the model is established, let's talk about the parameter estimation of this model. The parameters we need to estimate, ΞΈ=(Ξ±,Ξ΄,Ξ΅,ΞΌ)\theta=(\alpha, \delta, \varepsilon, \mu), are actually very difficult to estimate. This is because we cannot directly observe them, and can only observe the arrival of buy and sell orders. In this model, the daily buy and sell orders are assumed to follow one of the three Poisson processes. Although we don't know which process it is specifically, the overall idea is: more buy orders imply potential good news, more sell orders imply potential bad news, and overall buying and selling will decrease when there is no new information. With this idea in mind, we can try to estimate ΞΈ\theta using the maximum likelihood estimation method.

First, according to the trading model shown in the diagram, assume that there is bad news on a certain day, then the arrival rate of sell orders is (ΞΌ+Ξ΅)(\mu+\varepsilon), which means both informed and uninformed traders participate in selling. The arrival rate of buy orders is Ξ΅\varepsilon, that is, only uninformed traders will continue to buy. Therefore, the probability of observing a sequence of trades with BB buy orders and SS sell orders in a period of time is:

eβˆ’Ξ΅Ξ΅BB!eβˆ’(ΞΌ+Ξ΅)(ΞΌ+Ξ΅)SS!\begin{equation} e^{-\varepsilon} \frac{\varepsilon^B}{B!} e^{-(\mu+\varepsilon)} \frac{(\mu+\varepsilon)^S}{S!}\end{equation}

If there is good news on a certain day, the probability of observing a sequence of trades with BB buy orders and SS sell orders in a period of time is:

eβˆ’Ξ΅Ξ΅BB!eβˆ’Ξ΅Ξ΅SS!\begin{equation} e^{-\varepsilon} \frac{\varepsilon^B}{B!} e^{-\varepsilon} \frac{\varepsilon^S}{S!}\end{equation}

If there is no new information on a certain day, the probability of observing a sequence of trades with BB buy orders and SS sell orders in a period of time is:

eβˆ’(ΞΌ+Ξ΅)(ΞΌ+Ξ΅)BB!eβˆ’Ξ΅Ξ΅SS!\begin{equation} e^{-(\mu+\varepsilon)} \frac{(\mu+\varepsilon)^B}{B!} e^{-\varepsilon} \frac{\varepsilon^S}{S!}\end{equation}

So, the probability of observing a total of BB buy orders and SS sell orders on a trading day should be the weighted average of the above three possibilities, and the weights here are the probabilities of each possibility. Therefore, we can write out the likelihood function:

L((B,S)∣θ)=Β (1βˆ’Ξ±)eβˆ’Ξ΅Ξ΅BB!eβˆ’Ξ΅Ξ΅SS!Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β +αδ eβˆ’Ξ΅Ξ΅BB!eβˆ’(ΞΌ+Ξ΅)(ΞΌ+Ξ΅)SS!Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β +Ξ±(1βˆ’Ξ΄)eβˆ’(ΞΌ+Ξ΅)(ΞΌ+Ξ΅)BB!eβˆ’Ξ΅Ξ΅SS!\begin{align} L((B, S)| \theta)=Β  &(1-\alpha)e^{-\varepsilon} \frac{\varepsilon^B}{B!} e^{-\varepsilon} \frac{\varepsilon^S}{S!} \\ Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β &+ \alpha\deltaΒ  e^{-\varepsilon} \frac{\varepsilon^B}{B!} e^{-(\mu+\varepsilon)} \frac{(\mu+\varepsilon)^S}{S!} \\ Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  Β  &+ \alpha(1-\delta) e^{-(\mu+\varepsilon)} \frac{(\mu+\varepsilon)^B}{B!} e^{-\varepsilon} \frac{\varepsilon^S}{S!} \end{align}

Hence, the objective function of the maximum likelihood function is:

L(D∣θ)=∏i=1NL(θ∣(Bi,Si))\begin{equation} L(D|\theta)=\prod_{i=1}^{N}L(\theta|(B_i, S_i)) \end{equation}

Bottomline

The problem seems to end here. With the objective function, it seems to be all set as long as you program it and pay attention to the parameter boundaries. However, the real challenge comes next, because if you really write the objective function like this and run it, you will inevitably encounter an overflow error. After all, this function is filled with powers and factorials. Even if the time element is chosen very small, some highly liquid assets will still have hundreds of transactions within a few seconds. Therefore, both B!B!, S!S!, and (ΞΌ+Ξ΅)B(\mu+\varepsilon)^B can beautifully crash your program. So, further processing of the objective function here is extremely important.

By observing equation (16), the three terms in the likelihood function can actually extract a common factor eβˆ’2Ξ΅(ΞΌ+Ξ΅)B+S/(B!S!)e^{-2\varepsilon}(\mu+\varepsilon)^{B+S}/(B!S!)! After extracting this common factor, you can also substitute x≑Ρμ+Ρ∈[0,1]x\equiv \frac{\varepsilon}{\mu+\varepsilon}\in [0, 1] into it. The transformed likelihood function, after taking the logarithm, will be in the form:

l((B,S)∣θ)=ln⁑(L((B,S)∣θ))=βˆ’2Ξ΅+(B+S)ln⁑(ΞΌ+Ξ΅)+ln⁑((1βˆ’Ξ±)xB+S+Ξ±Ξ΄eβˆ’ΞΌxB+Ξ±(1βˆ’Ξ΄)eβˆ’ΞΌxS)βˆ’ln⁑(B!S!)\begin{align} l((B, S)| \theta)=&\ln(L((B, S)| \theta))\\ &=-2\varepsilon+(B+S)\ln(\mu+\varepsilon) \\ &+\ln((1-\alpha)x^{B+S}+\alpha\delta e^{-\mu}x^B + \alpha(1-\delta)e^{-\mu}x^S) \\ &-\ln(B!S!) \end{align}

Now, since the last term ln⁑(B!S!)\ln(B!S!) does not affect the parameter estimation at all, it can be safely excluded. The remaining part can perfectly avoid overflow. Personally, I think the brilliant move here is the introduction of x≑Ρμ+Ρ∈[0,1]x\equiv \frac{\varepsilon}{\mu+\varepsilon}\in [0, 1], which prevents the overflow error caused by (ΞΌ+Ξ΅)>1(\mu+\varepsilon)>1.


Tags: