The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Different types of data operations may exhibit different data access patterns. For instance, some types of database operations may be classified as involving either Online Transaction Processing (“OLTP”) or Data Warehousing. A typical OLTP operation accesses only a handful of records. For example, an OLTP operation may be to “retrieve the current order for this customer.” By contrast, a typical Data Warehousing operation involves analyzing thousands or millions of rows. For example, a Data Warehousing operation may be “Find the total sales for all customers last month.” Database systems typically employ different techniques to efficiently perform these different types of operations. For example, for OLTP operations, the database system builds indexes of certain table columns. The exact rows needed for an OLTP operation may be located by scanning the index, and only minimal data blocks are retrieved in a seemingly random access pattern. For Data Warehousing operations, on the other hand, a more comprehensive scan of a table is needed. Thus, large ranges of data blocks are retrieved in a more sequential access pattern.
An example scan operation is a table scan. A table scan is an operation performed on a database in which each row of a table is read, and the columns encountered are checked for the validity of a condition. A table scan may be performed in a variety of circumstances, such as when a column of a table that is involved in a database operation does not contain an index that can be used for checking the validity of a condition. Table scans are often sequential, in that the rows may be read in a sequential (serial) order. A table scan often requires a relatively heavy amount of input/output (“I/O”) read and/or write operations at the data storage system upon which the underlying data for the table is stored, thus resulting in multiple seek operations as well as costly disk-to-memory transfers. Many other scan operations also exist. For explanatory purposes, this application may at times describe techniques in terms of “table scan operations” or “table scans.” However, unless otherwise noted, the techniques are also more generally applicable to any type of scan operation.
Caching scanned data can be a big challenge due to the size of scanned data sets. Conventional database caching algorithms are designed to cache frequently accessed random data blocks, and this approach is often sufficient for OLTP data. However, scanned data sets are often very large relative to OLTP data. After a scan operation is performed on a data set, the entire data set may not be referred to again for quite some time, if at all, and/or another set of data might need to be scanned. If multiple scans of data are occurring concurrently, the scans may end up replacing each other's data.
In database environments that involve both OLTP data and Data Warehousing data, the caching of scanned data presents further obstacles. Moreover, scanned data may quickly push OLTP data out of a cache, even though the OLTP data is often more useful to cache. It is thus a challenge to cache both kinds of data at the same time. Database administrator therefore typically fine tune the caching policy of their database systems to either an OLTP data access pattern, or Data Warehouse data access pattern. In some instances, an administrator may explicitly mark which data units to cache and/or which data units not to cache.