The present invention generally relates to the processing of data records, and more particularly, to organizing data records in a relational database.
Many data management systems have been developed to effectively access and organize data records. Among these data management systems, relational databases are one of the dominant data management systems in enterprise markets. With an ever increasing amount of data to be stored and accessed, performance requirements become higher and higher on relational databases.
For example, a large bank that provides bank cards to its customers would typically have one or more computer application programs for supporting bank card servicing. Because numerous customers use bank cards, the bank's relational database may have to process millions of online transactions each day, for example, to query and update account information. Upon analysis of these transactions, the performance bottleneck of a relational database may be a synchronized input/output (I/O) operation when account information is loaded into a memory (also called buffer pool). One way to reduce the impact of this type of performance bottleneck is improve the hit rate of the buffer pool so as to enhance the efficiency of data access.
In addition, data is typically organized into data pages and though random data access often only needs to read or write one data record, the relational database has to load the entire data page into the buffer pool. Thus, it is possible that large portions of memory are assigned to non-frequently-accessed data records or data records that will not be accessed, thus resulting in a waste of space in the buffer pool, and therefore a lower buffer pool hit rate.
In order to improve the hit rate of the buffer pool, several approaches have been used.
A first approach is to use a data pre-fetch mechanism which pre-loads required data records into the buffer pool. This approach requires knowledge of a next access pattern in advance and in an application such as the bank example above, a next data access behavior is not predictable. Thus, the use of a data pre-fetch mechanism typically does not result in an increase in performance for applications that have unpredictable data access behaviors, or that perform random data accesses.
Another approach is to increase the buffer pool size or to optimize the buffer pool. By increasing the size of the buffer pool in which data records are loaded, more data records may be kept in the memory. However, this solution has a high cost, and it is generally impractical given typical memory constraints to obtain an amount of memory in the buffer pool that is identical to or close to the size of a data table. The buffer pool may be optimized, for example, by automatically setting a buffer pool size based on a historical buffer pool hit rate trend (see, for example, U.S. Patent Pub. No. 2002/0046204, entitled “Heuristic Automated Method for Ideal Buffer Pool Tuning in a Computer Database” by Hayes; and U.S. Pat. No. 7,512,591 entitled “System and Method to Improve Processing Time of Database by Cache Optimization” by Bildhaeuser et al.) and compressing the data records in the buffer pool to reduce memory utilization. One drawback to these optimization solutions is that they do not directly address the nature of random data access, i.e., the poor predictability.
A further approach to improving the hit rate of the buffer pool is to change the data structure. For example, some applications archive infrequently used data records into an archive table. A drawback to this solution is that it requires significant changes to the application. Meanwhile, batch processes, such as generating a report, may become more complex because data is distributed into multiple tables which makes application logic more complex to control.
A further approach to improving the hit rate of the buffer pool is to organize data records in a manner similar to that used by a file system or an operating system. For example, frequently used data is stored in a fast storage device. A drawback to this approach is that data organization in a relational database is stored in the order of a clustering index and organizing the data in a manner similar to that used by a file system or an operating system affects both the sequential access and insert performance of the data records in the database.