Competitive Adjustment to New Information at High-Frequency: The 2016 Election Results

[Placeholder, start on slide below.]

Competitive Adjustment to New Information at High-Frequency: The 2016 Election Results

MATT BRIGIDA

Associate Professor of Finance (SUNY Polytechnic Institute)

The Data

E-Mini S&P 500 Futures

Trading hours from Sunday--Friday 6:00 p.m. - 5:00 p.m. Eastern Time (ET)
Contract value is $50 times the futures price.
Cash delivery with expirations every 3 months.
Traded on the Chicago Mercantile Exchange (CME) (pit and electronic (Globex)).
Stock market price discovery occurs in this market.

Data are Market Depth Data for E-Mini S&P 500 futures (Globex) purchased directly from the CME.

Market Depth Data contains all market messages (trade/limit order updates) to and from the CME, and is time-stamped to the nanosecond. Also tags agressor side.
Can recreate the ES orderbook up to 10 levels deep.

The data are encoded in the CME's FIX/FAST message specification.
The translation scripts used in this analysis are available here.

Why CME Data?

All trades and quotes take place in this one central book
No delay in orders due to location
Heavily traded

Clock vs Market Time

In the following plots we'll take a look at the data. When looking at them it is important to note that the plots are in market vs clock time.

A Look at the Data

Review

Private Information

Trades are the method by which private information is incorporated into prices.
Market participants estimate the amount of private information in each trade (as opposed to those trading on noise or for liquidity reasons).
The greater the estimated amount of private information, the greater the price impact of the trade.

Public Information

Price adjustments to public information can be made by changes in the bid and offer prices.
Trades may affect the ultimate reaction, and speed with which it occurs.

Kyle's Model

Kyle (1985) formally models the trading strategy of a trader with private information, who attempts to trade in a way to maximize profits from this information.

Within the model a parameter, $\lambda$ is determined which measures market depth.
Specifically $\frac{1}{\lambda}$ measures the orderflow (in $) necessary to increase/decrease the price by $1.

Estimation

To estimate $\lambda$ we use:

$\lambda = \frac{|\Delta(price)|}{volume\ in\ \$}$

rearranging this formula affords:

$|\Delta(price)| = \lambda(volume\ in\ \$)$

We can thus estimate $\lambda$ via the following regression equation:

$|\Delta(price)_t| = \alpha + \lambda(volume\ in\ \$)_t + e_t$

A Brief HFT Strategy Overview

First, what makes a market good?

Liquidity
Price Discovery

Broad Types of HFT

Liquidity Providing
Opportunistic

Liquidity Providing

HFT serve a market making function in modern markets. This function has allowed the creation of new exchanges. HFT that serve as market makers may not have all the regulations of official market makers.

There is evidence the majority of HFT serve as liquidity providers.

Opportunistic

Often employ momentum trading strategies, or trade based on low latency reaction to events ( structural and latency arbitrage; order detection). They still rarely cross the spread.

These strategies tend to reduce liquitity.
Disintermediate between market participants.
Harmful to markets.
Likely the minority of HFT.

Reactive Liquidity

A Look at Algorithmic Reaction to Trades

Is liquidity really there? In modern markets this cannot be adequately answered even ex-post given the speed with which high-frequency traders (HFT) can remove liquidity in response to a market event.

Would that same liquidity have been there if you flashed an order or if someone had executed a trade?
O'Hara (2015): as many as 98% of all orders are subsequently cancelled instead of being executed as trades

Trade criteria:

The trade had to be for 5 or fewer contracts.
No trades in the following 100 ms.

Method

We calculate the change in liquidity in the 100 ms after each trade, and regress this on 3 measures of market activity pre-trade. The three pre-trades measures are:

The average bid/ask spread.
The number of trades.
The number of book changes.

Example Results: Buy Trade, Offer Side

Thinking With Data

Algo Classification

Hasbrouck (2016) (High-Frequency Quoting) used variance ratio tests to find the presence of algorithmic trading. The method was ex-post.

The analysis found evidence of very fast (independent) changes in the bid/offer.
Changing liquidity is a greater cost for slower traders.
Can we modify this method to identify the activity on-the-fly?

If we can identify periods with particularly volatile liquidity we can create trading mechanisms whereby slower traders liquidity taking orders are canceled/delayed until liquidity is less volatile.

High-latency (retail) would not notice the delay. It would save each retail trader a fraction of a cent on each share, but across all retail trades this would be considerable savings.

Because of the low-latency nature of the data, these are necessarily automated rules (estimate liquidity volatility over the next few milliseconds and potentially delay given the results).

Identification would have to be quick (forward pass through models in a few milliseconds).
A similar idea is being implemented by the IEX. Their signal identifies an imminent change in the bid/offer midpoint by monitoring changes in the best bid/offer prices across stock exchanges.

From the paper

"Now that we have a basic version of a crumbling quote signal to work with,we can consider how to deploy it on our market. Historically, we have embedded it inside our discretionary peg (dpeg) order. This is a dark ordertype that rests on the book at the near side, but can exercise “discretion”to trade at the midpoint of the NBBO when the crumbling quote signal isnot on. In this way, discretionary peg orders behave less aggressively whilethe signal is on, seeking to avoid bad trades at the midpoint just before the NBBO moves in their favor."...

"Just this month, IEX gained approval from the SEC to release a new version of our primary peg order type which will also incorporate the signal.This order type will rest one tick outside the NBBO and exercise discretionto trade at the NBBO while the signal is off."

Markov-Regime Switching Regression Approach

Real-Time Classification

I attempt to identify the presence of algorithms by including thier presence as a latent variable in a Markov-Switching regression---an unsupervised learning approach

Varying Liquidity

I measure liquidity on each side of the book as the amount of ES that can be bought within one point of the present bid-offer midpoint.

1 point is 4 ticks (so maximum the inside quote and 3 additional leveld of the book)
I control for changes in the bid-offer mindpoint.

Model

\[ Liq_t = \begin{cases} \alpha_1 + \beta_1 Liq_{t-1} + \epsilon_1, \ \ \ \epsilon_1 \sim N(0, \sigma_1) \\ \alpha_2 + \beta_2 Liq_{t-1} + \epsilon_2, \ \ \ \epsilon_2 \sim N(0, \sigma_2) \\ \end{cases} \]

\[ P(s_t = j | s_{t-1} = i) = p_{ij} \\ \text{for} \ \ i,j \in {1,2} \ and \ \sum_{j=1}^2 = 1 \]

Model Estimation

The model is estimated via the Hamilton Filter.

Original code was in plain R, though it took to long to estimate.
Rewritten in C++, making use of Rcpp.

Example Code


double lik(NumericVector theta, NumericVector lnoil_, NumericVector lnng_, NumericVector change_bam_) {

  arma::colvec lnoil = lnoil_;
  arma::colvec lnng = lnng_;
  arma::colvec change_bam = change_bam_;

  double alpha1 = theta[0];
  double alpha2 = theta[1];
  double alpha3 = theta[2];
  double alpha4 = theta[3];
  double alpha5 = theta[4];
  double alpha6 = theta[5];
  double p11 = 1 / (1 + exp(-1*theta[6]));
  double p22 = 1 / (1 + exp(-1*theta[7]));
  double alpha7 = theta[8];
  double alpha8 = theta[9];



  arma::mat dist_1 = (1 / (alpha5 * sqrt(2 * PII)))*exp((-pow((lnng - alpha1 - alpha3*lnoil - alpha7 * change_bam), 2))/(2*pow(alpha5,2)));
  arma::mat dist_2 = (1 / (alpha6 * sqrt(2 * PII)))*exp((-pow((lnng - alpha2 - alpha4*lnoil - alpha8 * change_bam), 2))/(2*pow(alpha6,2)));

  arma::mat dist = arma::join_rows(dist_1, dist_2);

  arma::colvec o;
  o << 1 << 1;

  arma::mat P;
  P << p11 << 1 - p22 << arma::endr
    << 1 - p11 << p22 << arma::endr;

  int length_lnoil = lnoil.n_elem;

  arma::mat xi_a(length_lnoil, 2);
  arma::mat xi_b(length_lnoil, 2);
  arma::vec model_lik(length_lnoil, 1);

  // replace to use armadillo's elementwise multiplication operator % as in mat c =  v % b;
  arma::rowvec first_xi_a_numerator;
  first_xi_a_numerator << p11 * dist.row(0)[0] << p22 * dist.row(0)[2];
  double first_xi_a_denominator = p11 * dist.row(0)[0] + p22 * dist.row(0)[1];

  xi_a.row(0)[0] = first_xi_a_numerator[0] / first_xi_a_denominator;
  xi_a.row(0)[1] = first_xi_a_numerator[1] / first_xi_a_denominator;

  for (int i = 1; i < length_lnoil; i++){
    xi_b.row(i) = (P * xi_a.row(i - 1).t()).t();
    xi_a.row(i) = ((xi_b.row(i) % dist.row(i)) / (xi_b.row(i)[0] * dist.row(i)[0] + xi_b.row(i)[1] * dist.row(i)[1]));
    model_lik.at(i) = (xi_b.row(i)[0] * dist.row(i)[0] + xi_b.row(i)[1] * dist.row(i)[1]);
  }

  double logl = arma::accu(log(model_lik.subvec(1, length_lnoil - 1)));
  return -1 * logl;

}

Results: 2 State Model

\[ Liq_t = \begin{cases} -0.83 + 0.49 Liq_{t-1} + \epsilon_1, \ \ \ \epsilon_1 \sim N(0, 0.47) \\ 0 + 1 Liq_{t-1} + \epsilon_2, \ \ \ \epsilon_2 \sim N(0, 0.002) \\ \end{cases} \]

Results

The model is only picking up states of changing liquidity and stable liquidity.
Ex-post this makes sense.
This motivates a 3 state model:
1. stable liquidity
2. normal changing liquidity
3. changing liquidity due to HFT

Results: 3 State Model

\[ Liq_t = \begin{cases} -0.00 + 1 Liq_{t-1} -0.12 \Delta BAM + \epsilon_1 \\ -0.01 + 0.32 Liq_{t-1} + 0.004 \Delta BAM + \epsilon_2 \\ -0.09 + 0.22 Liq_{t-1} + 1.02 \Delta BAM + \epsilon_2 \\ \end{cases} \]
\[ \texttt{where} \begin{cases} \epsilon_1 \sim N(0, 0.004) \\ \epsilon_2 \sim N(0, 0.40) \\ \epsilon_2 \sim N(0, 0.2929) \\ \end{cases} \]

4 States?

AI/ML Approaches

LSTM Time Series Prediction

Signal Based on Prediction Error

We can use a prediction error above a particular threshold as our algo indicator.

Downside no obvious threshold other than a confidence interval.
Must establish a link between the prediction error and algo activity.

Neural Net Graph (from Tensorboard)

LSTM Vs MS Switching Results


### lets look at the correlation between signals------

pred_error_signal <- 1 * (abs(pred_error) > .1)
state_prob_signal <- 1 * ((1 - state_prob) > .5)

cor.test(pred_error_signal, state_prob_signal)
## 	Pearson's product-moment correlation

## data:  pred_error_signal and state_prob_signal
## t = 110.56, df = 99999, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3245093 0.3355549
## sample estimates:
##       cor
## 0.3300434

Assuming a 20 milllisecond firing window, our signal wil be on for about 1.25 hours per day, which cumulatively is about 5% of the trading day.