Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as rows having fields of information. As an example, a database of employees may have a row for each employee comprising different columns or fields such as name, department, phone number, salary, etc. Rows are organized in a table, a two-dimensional structure with rows indicating the rows having fields of information and columns indicating the fields.
To accelerate access to the data of the database table, rows can be indexed and the index entered into a database index. For example, one index can indicate where to find the rows containing a particular name, another index can indicate where to find the rows containing specific departments.
Especially in relational database systems, the indexes are arranged as trees to allow a query to be less time-consuming.
Rows and indexes are physically stored on storage media such as, for example, tapes or disks. When performing any operation on the database such as, for example, processing a query, inserting a new row, indexing a new row, etc., the corresponding data is retrieved from the storage medium into a cache or buffer unit where a database management system can perform the operation on the data while the data is in the cache.
Database operations such as queries, updates, and deletes involve scanning entries and data rows of the database tables to determine which of the entries and data rows constitute a result set of the operation. The scanning occurs within the database management system, typically in the database cache. The index entries and data rows are brought into the cache from external storage, typically disk or tape subsystems. As the speed of reading from the external storage is much slower than other processes that are involved in executing database operations, the overall performance is in most cases heavily influenced by the numbers of reads from the external storage.
Existing database management systems employ various techniques for reducing the number of external storage reads. These techniques comprise, for example, using very large database caches, optimizing the data residency times, or reading data in advance whenever a sequential pattern of accesses is expected or detected.
The sequential pattern of accesses occurs when the data that are to be read into the database cache reside in storage medium blocks that are close to each other. Two blocks are considered close to each other when they can be read in a single storage read operation.
Reading the data in this fashion is so efficient that virtually all commercialized database management systems provide a means of storing table data rows in the sequence or close to the sequence of one of the indexes of the table. This process is often called “clustering rows” according to a selected index. Some database management systems attempt to maintain the clustering while inserting new rows and provide for periodic reorganization when there is not enough free space within the existing data to accommodate random inserts. Some database management systems utilize periodic reorganization, which speeds up insert processing but slows down queries until the data is reorganized.
Although this technology has proven to be useful, it would be desirable to present additional improvements. Many queries do not have a sequential access pattern. Accessing data through any other index but clustering is typically random and only occasionally results in some degree of “sequentiality”. Furthermore, the queries that can take advantage of the sequential access heavily depend on regular data reorganizations. The reorganization is an obtrusive operation that is expensive in terms of resources usage and introduces a degree of contention into the system. The reorganization typically requires attention of a database administrator, which is difficult in environments comprising many data tables.
What is therefore needed is a system, a computer program product, and an associated method for a just-in-time priming of a database cache. To increase the efficiency of data read operations, ‘priming’ of the cache is often used as a key concept in the cache management. Most database management systems use some kind of priming such as, for example, through sequential prefetch. The need for such a solution has heretofore remained unsatisfied.