(1) Field of the Invention
The present invention relates to the field of computer systems. More specifically, the present invention relates to the field of improving record throughput within a relational database management system (RDBMS).
(2) Prior Art
Computer implemented relational database management systems (e.g., RDBMS) are well known in the art. As an RDBMS is used, transactions are processed that alter and/or add to the existing data maintained by data structures within the RDBMS. For instance, as telephone orders are being processed by a merchant using an RDBMS for inventory management and billing, each order of goods or services can be represented by a discrete transaction. Typically, each transaction within the RDBMS is composed of different information records. Data records are used to represent the data that is being added and/or modified within the RDBMS. A roll back record is a type of record that is used to indicate that previously recorded updates within another transaction have been undone by the transaction associated with the roll back record. Lastly, commit records are used to indicate that a particular transaction has evolved into a state whereby it is durably recorded into the RDBMS. With reference to timing, as a transaction is being processed, its associated data records are produced and transmitted. Following the data records, the associated commit record or roll back record is then transmitted.
In many RDBMS, the above mentioned information records are received from various processes and stored in a computer readable memory buffer. Subsequently, the information records are recorded into an after image journal (AIJ) file residing typically in a non-volatile magnetic or optical recording media ("disk"). The information in the AIJ file is then used to represent that the data structures of the RDBMS have been updated. A transaction is "durably recorded" into the RDBMS when its associated commit record (and data records) are stored into the AIJ file on disk. Before a transaction is "durable," its associated commit record resides in the memory buffer within the RDBMS and before the transaction becomes durable, it is not recoverable should the RDBMS temporarily shut down or temporally malfunction in operation. Storage of the record information takes place using a computer driven input/output (I/O) operation which obtains the information from the memory buffer and records (commits) it persistently into the AIJ file on disk. Therefore, commit processing refers to the task of recording transaction information from the memory buffer to the AIJ file on disk to durably record the transaction.
Because the latency of an synchronous I/O operation to the disk is typically rather long (e.g., 10-30 ms, or more) in relation to most other RDBMS activities, a standard database approach for improving the efficiency of simultaneous transaction commit processing is known as the "group commit" operation. Under this approach, the data records and commit records for multiple transactions committing at approximately the same time are "grouped" together and processed as a batch by a single I/O issuing process (the group commitor process). Performing all of the transaction commits as a batch operation results in a significant reduction of I/O operations to the AIJ file by increasing the number of transactions per I/O.
In the prior art, database products determine which transactions are committing at "approximately" the same time through the use of a fixed interval timer. The group commitor process of the prior art sets the fixed interval timer as a mechanism of waiting for other transactions to commit (e.g., store their commit records in the memory buffer). Then, when the timer expires, all transactions that stored a commit record in the memory buffer at or before the fixed timer's expiration are written to the AIJ file on disk and become durable.
The use of fixed interval timers to perform group commit processing is problematic for various reasons. First, in cases when the workload is heavy (e.g., many commit records are being stored in the memory buffer), there is a risk that the fixed timer interval can be set either too long or too short. For instance, if the timer interval is set too short, as shown in FIG. 1A, then too many I/Os are issued thereby significantly reducing the throughput of the AIJ device, e.g., the disk. To illustrate this case, FIG. 1A shows a timing diagram with four full timer intervals 10a, 10b, 10c, and 10d (where time periods are not shown to scale). Commit records being stored in the memory buffer are shown as down arrows 5a, 5b, 5c, 5d, and 5e. Also shown are four I/O intervals 16a, 16b, 16c and 16d wherein an exemplary duration of each I/O interval is 20 milliseconds (ms) in length 12. Since the timer interval 14 is set too short, four I/Os at 20 ms each are required to store 5 commits to the AIJ file. Under this approach, on average 1.25 transactions are performed per 20 ms or roughly 63 per second, which is far too inefficient for practical use. In this configuration of the prior art, the AIJ device becomes a throughput bottleneck for the database system.
In cases when the workload is heavy, the timer interval can also be set too long as shown in FIG. 1B (where time periods are not shown to scale). In this instance, commit records received just after an I/O are forced to wait in the memory buffer thereby preventing their associated transactions from completing. Server processes (or threads) that are data dependent on the data associated with these commit records are forced to wait over the interval period until the next I/O. While these processes wait, they consume system resources and, more importantly, they are prevented from performing database modifications while stalled thereby limiting the throughput capability of the database system. Processes waiting ("stalled") are not processing database modifications. This reduces the overall system's throughput.
Furthermore, a transaction is not able to complete until it is durably written into the AIJ file. Therefore, as record 15a is pending in the memory queue until the next I/O cycle, its associated transaction is delayed from completing; the same is true for record 20a and its associated transaction.
In particular, FIG. 1B illustrates two timer intervals 15 and 20 which define the size of their respective commit groups. Each interval commits seven transactions (15a-15g of interval 15 and 20a-20g of interval 20), so the overall data throughput is 350 transactions per second, assuming a 20 ms I/O interval 25a, 25b. Although the data throughput is greater compared to the case above (FIG. 1A), data dependent processes that are dependent on the data associated with commit record 15a are forced to wait almost the entire timer interval 15 before this transaction is actually written to the AIJ file by I/O operation 25a. The same is true with respect to commit record 20a and timer interval 20. While the data dependent processes are delayed, they consume valuable computer resources and reduce data throughput. Furthermore, as stated above, transactions associated with the records in the memory queue are not completed until their records are durably written into the AIJ file. This further reduces throughput.
Second, in cases when the workload is light, it is appreciated that the use of a fixed timer mechanism to control group commit size results in transactions forced to wait for the timer to expire, even though no other transactions are available to join its commit group. This case is shown in FIG. 1C having one commit record 30a received in timer interval 30 and another commit record 31a received in timer interval 31. Although no other commit records are being received, commit record 30a is forced to wait almost the entire duration of timer interval 30 before being written into the AIJ file by I/O process 35a. The same is true for commit record 31a with respect to timer interval 31. In this latter case, not only are certain data dependent processes (that need the data from the transaction associated with commit record 30a) forced to wait until the timer expires, but the overall AIJ throughput is also decreased using this prior art approach because two I/O operations 35a and 35b are used when one I/O operation would have been sufficient. In short, the prior art use of fixed duration timers is too inflexible for a dynamically changing transaction workload in view of the above instances.
Accordingly, what is needed is a computer implemented system for issuing I/O processes to write transaction records to an AIJ file stored in durable recorded medium that operates efficiently in heavy workload conditions as well as light workload conditions to avoid the problems discussed above. The present invention provides such an advantageous system.