Querying data stored on storage disks may be computational heavy and time consuming. The query operations include scanning data on storage disks by reading out units of data and performing other operations such as joining, aggregating or filtering the scanned data based on predicates specified in the query. The scanning operation itself is generally time consuming due to slower access time for storage disk memory rather than operational volatile memory (such as RAM). For example, depending on storage disk technology, a read operation for disk storage memory may take up to 100 times longer than a read operation from operational memory.
Although incurring such high cost for a read operation, some of the read output data may be thrown away at a later stage of query processing. For example, a query may specify joins or predicates that effectively select a subset of the target data set of the query. However, to determine the subset, the full target data set is usually scanned and then filtered based on the computed or specified criteria. Thus, many rows of the target data set may be scanned, incurring the cost of reading operations from disk storage but then wastefully discarded for not matching the filtering criteria of the query.
One traditional solution to speed up read operations is to cache data of storage disks in operational memory (e.g. low latency memory such as non-volatile memory) when read from storage disks. Many techniques have been developed to optimize the caching to provide for maximum “cache hits” and minimize the necessity for accessing data on storage disks. However, in large data management systems such as a database management system (DBMS), it is almost impossible to avoid a cache miss for data requests because of the sheer size of data traffic. A DBMS that processes 1,000's of queries a second produces many cache misses as data is constantly swapped in and out of the cache.
Furthermore, using a large amount of operational memory for data cache decreases the performance of the system. The system uses operational memory for computational and data processing operations. However, as larger and larger portion of the memory is occupied by the data cache, lesser and lesser portions are available for computational and data processing operations. Thus, the operations will take longer time to execute degrading the performance of the whole system.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.