Skip to content

0xDirmes/fordham-technical

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Simulation Research Driver Challenge

Summary

Design and implement a file driver that manages the sequential processing of market data. The system should intelligently find, iterate and process market updates in sequential order based on their arrival time.

Especial consideration should be placed on:

  • The memory usage of the program. The files are too large to be loaded in memory at once, so a low-memory solution is required.
  • The data comes as a .parquet file, so the processing of these files should take into account the decoding, of the file, as well as the transformation into the desired type. The desired type is left as an exercise to the implementer.

There will be 3 different types of market data to process.

Tick data

This update is generated when there's any update in price. The update reflects the state of the top of the book the moment of the change in price. The relevant data to process for this assignment is:

Name ColumnName Description
ts exchangeTimestamp timestamp of the market update (At the exchange)
rts receiptTimestamp timestamp of arrival of the market update (At your server)
bid_price bid bid price of the market update
ask_price ask ask price of the market update
bid_size bidVolume bid size of the market update
ask_size askVolume ask size of the market update

Trade data

This update is generated when there's a trade executed. The update contains information about the executed trade including price and quantity. The relevant data to process for this assignment is:

Name ColumnName Description
ts exchangeTimestamp timestamp of the trade execution (At the exchange)
rts receiptTimestamp timestamp of arrival of the trade update (At your server)
price price executed trade price
size volume executed trade quantity
side isBuySide side of the aggressor (buy/sell)

Note: isBuySide should be transformed to 1 if the agressor is buy, -1 otherwise.

Order Book data

This update provides a snapshot of the full order book state at a point in time. The update contains all price levels with their respective sizes on both bid and ask sides. The relevant data to extract and process for this assignment is:

Name ColumnName Description
ts exchangeTimestamp timestamp of the order book state (At the exchange).
rts receiptTimestamp timestamp of arrival of the snapshot (At your server)
bids bids array of [price, size] pairs for bid side
asks asks array of [price, size] pairs for ask side

Note: This update comes as two separate order book updates. You can assume the updates are symmetrical and the ts and rts fields will be the same on both sides of the book.

System Behavior

Your solution should demonstrate how you approach the following scenarios:

  • Handling the sequential processing of files, taking into account:
    • There are several files per day for each datatype.
    • Files will be ordered meaning 2024-07-07.binance.BTCUSDT.00.parquet will precede 2024-07-07.binance.BTCUSDT.01.parquet
  • You'll have a class, called Strategy that with the following methods.
    • process_data()
      • process_tick_update(symbol_name, update)
      • process_trade_update(symbol_name, update)
      • process_book_snapshot_update(symbol_name, update)
  • The updates should: be read -> parsed and transformed -> fed sequentially to the strategy object.

What we're looking for

  • Efficient file reading and parsing of large parquet files
  • A clean implementation focusing on the core market data processing functionality
  • Good memory management and data streaming practices
  • Readable and well-structured code that separates concerns between data ingestion and processing
  • Thoughtful handling of data ordering, sequencing and timestamps

What we're not looking for

  • Don't use libraries such as pandas or numpy
  • Don't load more than a single update in memory per dataType at a time.
  • This is a focused problem. If your solution is overly complex, take a step back

Suggested architecture

Note: While this represents a suggested structure as a starting point, you are encouraged to adapt and modify the implementation according to your needs and design choices.

class DataReader:
    def __init__(self, file_paths: Iterable[Path]) -> None:

    def load_next_file(self) -> None:
        """loads the following file into the FileDriver"""

    def get_next_update(self) -> None:
        """Returns the next raw update from the file. If we get a StopIteration,
        we reached the end of the file and should close it, and if there's another one, load it"""

    def parse(self) -> None:
        pass

    def update(self) -> None:
        pass


class TickDataReader(DataReader):
    def __init__(self, file_paths: Iterable[Path]) -> None:

    def parse(self) -> dict:

    def update(self, update: dict, strategy: Strategy) -> None:
        strategy.tick_update(self.symbol_name, update)

(...)
class Driver:
    def __init__(self, market_data: list[DataReader], strategy: Strategy) -> None:

    def remove_file(self, flat_file: DataReader) -> None:

    def start(self, start_timestamp: int) -> None:

    def run(self) -> None:

Extension Options

If time allows, consider implementing one of these extensions:

  1. Implement a configuration that allows our strategy to use timestamp instead of receipt timestamp, with configurable fixed delay to test the importance of our roundtrip latency in production.
  2. Suggest a implementation to increase/decrease the roundtrip latency accounting for "exchange clogging", which should add latency if the timedelta between updates is low, and subtract latency if high. An exchange should be able to process ~5000 updates per second, and start experiencing some (capped) throttling after that.
  3. Add support for book_updates files and rate_update files, available on the same location.

Instructions

Aim to spend about 1 hour on the implementation.

The code should be comparable to code you'd put in front of others for code review and put in production. It should address production concerns, but the number of concerns it addresses may be limited given the time constraint. Include what you can. If you're short on time, aim to make something unpolished that works rather than something polished but incomplete.

Include a README that explains:

  • Your assumptions and the reasoning behind them
  • Design decisions, especially for ambiguous aspects of the challenge
  • How to run your solution
  • What you would improve with more time
  • Any challenges you faced

When you are done, make a PR to the repo with your proposed solution. Include your name on the PR.

Use the tools and language you are most proficient with to complete the solution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors