Investment companies and brokerage houses typically store a tremendous amount of market data. Different database systems exist that can compile and aid one in searching through the historical stock market transactions. These systems permit, for example a financial analyst, to request the value of a particular stock on a particular date, or to request information related to sales on a particular exchange.
Traditionally, after the markets have closed for the day, data regarding the daily trading information is fed into databases administered by the various brokerage houses and investment companies. This market data regarding the transactions is often referred to as “tick data.” The data in the database can then be used for analysis and calculations regarding the various transactions that occurred throughout the day or any other desired time period. Due to the incredibly large amounts of data accumulated on a daily basis, efficiency with regard to storage techniques and retrieval techniques is critical to these types of database systems.
Current systems for storing and retrieving this tick data, such as time series databases, relational databases, and specialized in-memory databases, have their downfalls. For example, some in-memory database systems require large amounts of Dynamic Random Access Memory (DRAM) in order to provide fast access to the data. Acquiring and maintaining the required memory space is often very costly, or impractical due to technological limitations. For example, storing tick data from an Options Price Reporting Authority (OPRA) feed using in-memory databases is not possible due to technological limitations and the data volume. As the amount of data regarding the daily market transactions continues to increase, the storage space (and cost) continues to increase as well. Additionally, database systems, such as relational databases or time-series databases, while not using high-cost DRAM for storage, do not allow for fast enough data retrieval functionality. These types of systems may also not be able to handle large volumes of new record insertions (inserted as rows) fast enough. With the increasing amount of data stored on a daily basis, the ability to quickly retrieve the requested data in such systems may decrease, thereby decreasing the functionality and usefulness of such data storage system.
Many of the existing specialized in-memory databases for storing market data rely on creating various types of data arrays for each transaction. For instance, multiple fields, each with data regarding a given financial instrument, may be stored in memory as an array. Each field in the array is designated to a particular type of data, such as trade price, quantity, or a time stamp. Record-based array implementations are typically inflexible with regard to adding more fields, such as columns, to an existing database. In vector-based in-memory database systems, each column is stored in its own individual array (i.e., a vector) and each array is stored in a separate file. Accordingly, a database table consisting of 60 columns would require over 60 loosely coupled files with this type, of implementation. In various in-memory database systems, a row or record may have dozens, or even hundreds, of different columns (fields) to hold the various types of data that may be available for each transaction. If data associated with a particular column is not available or not applicable for a particular transaction, however, a null value is typically placed in the column. In these systems, for any given transaction, a multitude of columns may have a plurality of null values. The entire array, including the null columns, is stored into memory. Thus, even though numerous columns with null values do not contain any “useful” data, the columns still consume memory space, which consumes resources and adds to data retrieval times. Current systems, such as time series databases, relational databases, or in-memory databases, require tick data to be normalized in this fashion.