Given that volatile memory (also sometimes referred to as “main memory”) is becoming cheaper and larger, more data can be cached from disk storage to volatile memory. Such caching allows the data to be accessible faster, and for the application that uses the data, to perform work in a speedier fashion.
However, a number of challenges still remain with making data accessible in volatile memory. Firstly, the amount of data that is typically used by applications has also significantly increased. Particularly, to completely cache larger amounts of data (colloquially referred as “big data”) in volatile memory would require an exuberant amount of volatile memory. Thus, regardless of the size of volatile memory, there may still exist data (and in some cases a signification portion of data) that could not be concurrently cached in the volatile memory. Such data would to be accessed from disk storage and loaded into cache on an as-needed basis (replacing other data in the cache).
When a database system needs to perform operations on non-cached data, the data in the disk storage needs to be first read from the disk storage into the volatile memory of the database system. Once loaded into volatile memory, the database system can perform the operations on the data. However, reading data from the disk storage generally incurs a significant performance penalty compared to obtaining data that already resides in volatile memory. Thus, when a database system needs to perform operations on non-cached data, the database system fails to experience significant performance gains from the fact that the database system has a large amount of volatile memory.
One approach to fit more data onto volatile memory is to compress the data before storing the data into the volatile memory. Once compressed, the data would be resized to occupy less space in the volatile memory. However, not all data can be significantly compressed. Furthermore, if the compressed data is frequently accessed for operations, the data would need to be frequently decompressed to be used. Such frequent decompressions use compute resources that otherwise could have been used for data operations, slowing the data operations and consequently, the applications that requested the data operations. Accordingly, there is a significant drawback in indiscriminate data compression for data cached in volatile memory.
Furthermore, no matter at what level of compression data is copied into volatile memory, at some point the database system would still run out of space in volatile memory to store more data. Thus, when the volatile memory is at full capacity and a the database system needs to perform an operation on data that is solely stored on disk storage, some of data already in the volatile memory would need to be replaced to make room for the data from the disk storage. The more frequent such replacements, the more compute resources are wastefully spent on shuffling data in and out of the volatile memory. Thus, minimizing the frequency of data replacement would contribute to efficient performance of the database system.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.