Time series data are sequences of time stamped records occurring in one or more usually continuous streams, representing some type of activity made up of discrete events. Examples include information processing logs, market transactions, audio, video and sensor data from real-time monitors (supply chains, military operation networks, or security systems). The ability to index, search, and present relevant retrieval results is important to understanding and working with systems emitting large quantities of time series data.
Searching time series data typically involves the ability to restrict search results efficiently to specified time windows and other time-based metadata such as frequency, distribution of inter-arrival time, and total number of occurrences or class of result. For such real-time streaming/monitoring applications (client applications) data retrieval by timestamps is a common use case. However, time series data searches can return a massive amount of data items. In order to display the data in a meaningful manner, when presenting the retrieved time series data to a user, client applications typically need the data to be sorted by time. However, indexing and sorting time series data is further complicated because the data can be collected from multiple, different sources asynchronously and out of order. Streams of data from one source may be seconds old and data from another source may be interleaved with other sources or may be days, weeks, or months older than other sources. Moreover, data source times may not be in sync with each other, requiring adjustments in timestamps post indexing. Thus, most client applications typically need time-series data to be merged and sorted by time but they lack sufficient resources to perform such complex sorting even with optimized algorithms.
It is desired to have methods and procedures to perform optimized merge-sorting of time-ordered data within the storage layer itself.