Developing AWS Lambda Functions for Analyzing Market Data

Brian McCallion Case Study, Strategy 3 Comments

[I wrote this in 2015 as we worked with a hedge fund to leverage Lambda and Kinesis to calculate real-time vwap for a set of one thousand ticker symbols. Since then, this space has exploded with opportunity!]

As an exercise to jump start what we’re doing with market data, we’ve set aside a number of considerations, and we’re talking through the code. We should just be coding, but a little planning isn’t bad either. Nissim Karpenstein contributed to this post. Nissim is Bronze Drum’s expert on risk and pricing and financial products. Nissim has worked as Technology Director at Hudson Bay Capital prior to partnering with Bronze Drum. Nissim has worked as a financial markets applications engineer in New York City finance for the past 20 years.

Since publishing this post, and successfully transcompiling portions of QuantLib from C++ to ASM.js we’ve worked with a number of firms on derivatives pricing in the Cloud, as well as real-time applications in social media, and finance. What strikes me today in 2017 is that it’s still early, yet in 2017 I can already see firms in financial services separating from the pack, largely by adopting cloud scale analytics.

I you’ve had a lot of advice around the status quo, and you’d like to speak with people who can’t seem to help but think differently, contact us–we’d like to share our point-of-view. Since I wrote this article in 2015, a great deal has changed:

  • Lambda functions now support multiple languages, including Python, Java, C#
  • In 2016 the Amazon Kinesis team introduced Kinesis Analytics. Kinesis Analytics greatly simplifies the work of analyzing real-time data, and takes it to a place Nissim and I were working towards in 2015–real-time pattern analysis and anomaly detected. One of our customers streams thirteen million feeds into Kinesis (not all market data), and spends in excess of $8K/day on Kinesis, yet generates far more in revenue.
  • In 2016 AWS announced the F1 instance type that enables customers to launch and program Xilinx FPGA, and for ISV to offer FPGA AMIs in the AWS Marketplace.
[Here’s the original post]

Let’s assume there are two types of functions we can write for AWS Lambda:

  1. Functions that take input and return output.
  2. Functions that operate on data in a data store and modify it or enhance it.

The quantlib options pricing  is in category 1.  It takes input of put/call, underlying price, strike price, volatility and rate and it outputs price, delta, gamma, theta and vega for the option.  To call this function we need to input the stock price, the volatility and the interest rate which all come from market data feeds.

  1. If we are going to calculate bid/ask spread, we just need a simulated market data feed.  Each record in that feed has bid/ask and the spread is just the difference.
  2. If you want to enhance a tick history with VWAP, we could probably keep a running total of volume and volume weighted price and then calculate the ratio as VWAP = sum(volume weighted price) / volume.
  3. If we use this data structure for equity market data: http://en.wikipedia.org/wiki/Market_data

we could enhance that structure by keeping the running totals: agg_volume = previous tick agg_volume + volume.  agg_vol_wtd_price = previous tick agg_vol_wtd_price + (volume * last); and VWAP = agg_vol_wtd_price / agg_volume.

To do this we’ll need to get the previous tick which has already run through the function.  if there’s no previous tick we can initialize agg_volume = volume and agg_vol_wtd_pric = (volume * last).

AWS Lambda Functions for Analyzing Market Data

  1. How do we pull the previous tick from our database?
  2. How do we know that it’s been run through the function already?
  3. How do we know that we are processing the first tick of the day?

Directions and Thoughts

1. We can get the previous tick from Kinesis , by just requesting an early record. Is this the easiest way? Does it perform?
2. Kinesis keeps a separate “pointer” or “head” value for each “application” reading the Kinesis stream
3. Kinesis streams have a lifetime of 24 hours.
4. Lambda functions can be triggered by “events” such as arrival of data, so each day the first data that arrives would trigger an Lambda function.

 

To maintain the running totals, the function reads from a source stream, reads from a target stream with enriched data plus the prior tick value, adds the new value to the running total, and writes a new record to the enriched stream.

As another option, the first function can move backwards and forwards within a stream for up to 24 hours of records. So one of the things about Kinesis streams is that each separate application moves a point as it processes records. But each application can also go back in time up to 24 hours and read records from any point in the 24 hours of the stream, like rewinding a video.

Further Reading

Amazon Kinesis Analytics – Process Streaming Data in Real Time with SQL