1. Field of the Invention
The present invention relates to database systems comprising at least one disk storage and/or at least one tape storage, a Central Processing Unit (“CPU”) and an assigned Random Access Memory (“RAM”). The invention particularly relates to data caching in such database systems, and more particularly to optimization of sizes of cache fragments of a cache memory of a database system in order to optimize the data throughput in the database system. Thereby data throughput means the number of transactions per time unit. Transactions are sequences of Structured Query Language (“SQL”) queries for modification or readout of data of the database system.
2. Description of the Related Art
To improve exposure to database systems, particularly to accelerate processing time of such database systems, it is known to cache data of the disk or tape storage or to cache prepared data for SQL query processing in the RAM. In the following, the part of the RAM used to cache data is called cache memory. Queries and modifications via Structured Query Language (“SQL”) are carried out using the cached data. Thereby a problem arises that it is very time consuming to cache data that is used seldom or never. Furthermore it is costly to provide cache memory for all data of the database. Currently setting up data caching in a database cache memory is extremely difficult.
From U.S. Pat. No. 5,822,749 it is known to use an optimizer for formulating an optimal buffer cache strategy for a given query. A buffer cache is a cache memory to be used to load and cache data from the disk or tape storage of a database system. Thereby the optimizer communicates with a buffer manager before a query plan is formulated. The optimizer queries the buffer manager for the purpose of determining whether the object of interest exists in its own cache fragment. If the object exists in its own cache fragment, the optimizer inquires as to how much of the cache fragment the object requires, together with the optimal Input/Output (“I/O”) size for the cache fragment. Based on this information, the optimizer formulates the query plan with hints, which are ultimately passed to the buffer manager. Based on those hints received from the optimizer, the buffer manager can fine tune I/O, i.e. the cache management for the query. The drawback of this solution is, that the size of the cache memory and also its fragmentation, i.e. the number of cache fragments and also the size of the individual cache fragments has to be adjusted manually. Due to this, during usage of the database, the size of the cache memory and its fragmentation is fixed. The optimizer together with the buffer manager only decides in which order objects, or, if a cache fragment is smaller than an object, parts of objects are cached and how long they remain cached until they are deleted by reusing a particular cache fragment by caching another object.
From U.S. Pat. No. 5,680,573 it is known to cache objects of a database system, wherein different, fixed cache fragments are used to cache objects that are selected randomly and to cache objects that are selected non-randomly. The cache fragments used to cache the objects that are selected non-randomly are larger than the cache fragments used to cache the objects that are selected randomly. Thereby no optimization of the sizes of the cache fragments and/or of the cache itself takes place during operation of the database system.
From Japan Patent No. 10 320 212 A2, a method for cache optimization based on a cache miss ratio prediction is known. Thereby a cache simulation object program is used to simulate the cache miss ratio of a database. During operation of the database the size of the cache fragments is fixed according to the prediction gained by the simulation.
Furthermore it is known to group single objects or groups of objects to so called working sets. Such working sets are also known for queries or groups of queries.
From Japan Patent No. 7 271 674 A2 and from U.S. Pat. No. 6,129,458 methods for cache optimization are known, wherein if a working set that comprises a plurality of objects is larger than a cache fragment to be used to cache the working set, the working set is subdivided into working sub sets that are cached in different cache fragments.
According to the state of the art, the cache memory used to cache data of a database system has a fixed size. Such a cache memory can be fragmented into a plurality of cache fragments, each one to be used to cache a single or a group of objects of the database. Changing sizes of the cache memory and/or of the cache fragments can result in tremendous changes in system performance.
According to the state of the art, during operation of the database those cache fragments have either a fixed size or are resized dynamically based on very limited algorithms, which try to improve the system throughput by changing the cache fragments in a kind of trial-and-error principle. A known method to do so is to simply reduce the size of the best performing cache fragment and to increase the size of the worst performing cache fragment in regular intervals. Thereby only two cache fragments are changed per transaction, wherein most of the cache fragments remain unchanged. Furthermore the changes take place independent from previous analysis cycles, meaning that in a following cycle, the changes might be counterproductive. Due to this, the known method is not applicable to improve performance of database systems.
Today, figuring out an optimal configuration requires either re-simulating the whole workload for different cache fragment sizes and choosing the optimal configuration, or estimating the impact of fragment size changes. The first option normally takes hours of processing on a workstation, which makes it impossible to be used for a desirable automatic continuous background optimization. The later option depends on multiple factors and must be done for all objects and statements in the database, like e.g. tables, Materialized Query Tables (“MQTs”), indexes and the like, which makes it normally impossible for continuous manual optimization. Even if a Data Base Administrator (“DBA”) has figured out a good configuration, the database workload changes very often, which then results in poor system throughput.