Research Data Services (RDS)

A database of market microstructure measures.

Proof-of-concept joint work with Prof. Joakim Westerholm and Dr. Henry Leung.

Our objective is to construct and maintain a database of market micriostructure measures based on high-frequency tick history sourced from Refinitiv Thomson Reuters Tick History.

Measures Securities Records End Start End

Example Usage

I provide an easy-to-use SQL interface for researchers to retrieve the data. Below is an example usage in SAS.

Currently, you’ll need to be inside the USYD’s network to use RDS. You can either use a PC inside the Business School or use VPN.

/* Assign lib reference to connect to the server. */
libname rds mysql 
	server="" database=rds 
	user=actualusername password=actualpassword;

If you enconter this error:

ERROR: The SAS/ACCESS Interface to MYSQL cannot be loaded. The libmysql code appendage could not be loaded.

Solution is here

Let’s now use RDS to get a collection of measures estimated.

/* Some example usage. */
/* More measures and complete documentaion to come. */
proc sql;
/* Retrieve the full table of estimated measures. */
create table measures as select * from rds.measures 
	order by local_date asc, RIC asc;

/* Retrieve all Bid-Ask Spread estimates. */
create table baspread as select * from rds.bidaskspread;

/* Retrieve all Effective Spread estimates for RIC=AAL.OQ. */
create table espread as select * from rds.effectivespread 
	where RIC='AAL.OQ';

/* Retrieve all Realized Spread estimates from 2019-01-01 to 2019-01-15. */
create table rspread as select * from rds.realizedspread 
	where local_date between "01Jan2019"d and "15Jan2019"d;

/* Retrieve all LinSangerBooth1995 estimates of adverse selection. */
create table lsb1995 as select * from rds.measures
	where measure="LinSangerBooth1995";


Let’s try plot a timeseries to prove it works.

/* Simple timeseries plot */
title "Timeseries Plot of Realized Spread For USB.N and AIG.N";
proc sgplot data=measures;
	where measure="RealizedSpread" & (RIC="USB.N" | RIC="AIG.N");
	series x=local_date y=estimate /markers group=RIC;
	refline 0/axis=y;

The output is:

Example Timeseries Plot

Technology Stack

I wrote this system in Python and C/C++. A workstation of an 8-core 16-thread CPU, 64GB RAM and m.2. SSDs is used.

Example Usage

Behind the scene, the program classifies trade directions using Lee and Ready (1991) algorithm on the fly, and estimates several measures for each security and each day.

Results are stored in a MySQL database inside the university network, but may be stored and served at AWS in the future.

Development Plan

We plan to continue the development of this project and:

  • cover more measures, securities and extend the data period;
  • provide an easy-to-use web interface apart from the SQL interface;
  • provide a REST API for more efficient and professional usage.


This project may contain errors. Users are recommended to double check the data quality before usage. We hold no responsibility for any damage and/or loss incurred as a result of using any data provided on this site. We may provide the source code for selected measures and encourage users to check it for correctness and accuracy.

If there is any bug and/or error, please contact me at