A cache algorithm, also referred to as a replacement policy, comprises instructions for managing data objects stored within a memory cache. When the memory cache is full, the cache algorithm selects which data items are evicted from the cache to create space for incoming data items. For example, the Least Recently Used (LRU) cache algorithm tracks when each item of data in the cache was last used. The data item that was least recently used is evicted first from the cache to be replaced by the incoming data.
The LRU algorithm is a simple and effective way of managing cached items. It involves relatively low overhead to implement and maintain. However, the LRU algorithm may perform sub-optimally when it is used for scanning large objects such as in data warehousing workloads where the working set is often larger than the size of the cache. For example, if the object is cached as it is scanned, the latter part of the scan would push out the earlier part of the scan from the cache. Accordingly, a scan can self-thrash, which negates the benefit of caching and also pollutes the cache content by pushing other useful data out of the cache. In another scenario, when multiple processes are scanning different objects for caching, the combined size of the different objects may be larger than the size of the cache, even if each individual object by itself fits into the cache. Thus, the processes scanning concurrently thrash each other's cache content.
One approach to minimize self-thrashing is to exclude caching for large objects and, instead, read the large objects directly from disk without storing the objects in a cache. Although this approach can improve performance by reducing self-thrashing, large objects do not get the benefit of caching, even when there might be available system memory. In addition, when there are multiple concurrent scans of data under the “large object” threshold, there can still be thrashing due to the effective cache size available to each scan.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.