Today, the amount of data processed by database systems grows at an accelerating pace. In parallel, the demand for faster processing results on this growing amount of data is going up, which essentially means that users expect quicker results on larger amounts of data.
Increase in data volume means an increase in storage, driving up storage costs as well as operational costs due to higher power usage—today, electricity is the largest cost item in a data center. The costs for electricity increase with the speed of hard disks used (and the hard disk price also increases with higher speed). Companies may struggle with the rising costs of their data centers.
However, enterprises understand that—even though the data volume is growing—the speed for access to the data is not uniform across the data. This means that data may be classified according to an “age,” based on access demands:
Type 1: A portion of the data may often be needed by quickly accessing it to satisfy the performance demands. Typically, this is the case with frequently accessed data and may often be new data.
Type 2: A portion of the data may be needed less frequently whereby also the speed of access does not need to be as quick. Typically, this portion may rarely be accessed and may often be older data.
Thus, companies may like to optimize their storage costs by still complying with performance demands by placing the first type of data on the fastest, most expensive disks with higher electricity consumption and thus higher operational costs, while placing the second type of data onto slower hard disks, which are less expensive to purchase and operate. The concept of a storage architecture with different layers of storage capacities with different performance characteristics and operational costs is known as hierarchical storage management (HSM).
Furthermore, many types of data instances are aging: For example, an order which has been created may go through process steps such as creation, packing, shipment, invoice-shipment, payment received and finally closing. Then, the order information may only be needed for analytical purposes, such as revenue in the last quarter, over the last year, etc. Therefore, the data of the order may be aging from fresh creation, frequent access while fulfillment is executed, until accessed only rarely in analytical purposes and ultimately, it may not be needed anymore.
Unfortunately, commercial databases today do not support the automatic identification and placement of data in appropriate storage media based on an age of data, which may be derived from access patterns.
In particular, some commercial databases track to a certain degree the last modification of a data either on record, page or an extended level. But, read access is not tracked at all. Thus, it cannot even be determined today if a certain portion of data is frequently accessed by read operations instead of data modification operations. Thus, all data need to stay on fast hard disks—or solid state disks—even if the data may only be read or not accessed at all. A concept of temporal aging is not supported at all.
Finally, data must comply with retention policies by legal regulations. This requirement must be taken care of by any autonomic solution for temporal aging as well.
Document US20080154994A1 discloses a method for data management for implementing or otherwise managing aged index data for a database. The categorization is based on business logic, such as a closed or open business process.
Document US20090210445A1 discloses a method for optimizing data access in a record-oriented relational database containing data sets having attributes. Attributes are assigned higher and lower priority classes depending on an access frequency. The access frequency is determined based on counting accesses to a certain data set over a period of time.
Thus, there may be a need for an improved method and an engine for handling storage pages in a database, in particular handling storage pages in order to improve the overall performance of a database system.