1. Field of the Invention
Embodiments of the invention generally relate to improving the efficiency of database operations on a distributed database. More specifically, embodiments of the invention relate to using journaling in a multi-node environment to improve performance of a distributed database.
2. Description of the Related Art
A database management system (DBMS) provides a system configured to create, query, and manage databases, which in turn may include tables, rows, columns, and defined relationships between these elements. One feature commonly provided by database systems is referred to as a “journal” or “log.” As is known, a journal is a file used to store changes to the database. The journal provides a record of transactions that operate on the database, such as additions, updates, and deletions to the information contained in the database. Each transaction performed may generate a corresponding set of entries in the journal. One known use of journaling in a database application environment is commitment control. In database terminology, the journal is used to ensure that transactions are “atomic,” which means that either each step of a given transaction should be successfully performed or no steps should be performed. The journal provides a record of changes that can be rolled back when a given transaction does not occur successfully. That is, the journal allows a database engine to undo changes made during runtime when a transaction fails to be completed. The database engine can reconstruct the database state prior to the transaction using the information from the journal entry related to the transaction.
However, journals consume system resources that can adversely impact the performance of the DBMS as well as applications that share resources with the DBMS. For example, as transactions operate on the database, the journal must be updated with entries that include the changes made in these transactions. Further, journals require space in memory or disk. Thus, maintaining a journal requires both memory and processing resources, which can reduce overall system performance. Typically, however, the drain on resources is more than overcome by ensuring the integrity of transactions that operate on the database and the ability to undo changes that using a journal provides. Further, in many conventional DBMS systems, the journal simply consumes relatively inexpensive space on disk.
In large distributed systems, however, a large number of individual processing nodes may each provide a limited amount of memory used to store a portion of a database. Because any memory used for overhead reduces the volume of data that may be stored on a node in an in-memory database, it is important to maximize the amount of available memory, and to minimize the overhead of supporting structures like journals. Having an inefficient method for managing the memory space of the journals is detrimental to database efficiency in general and to an in-memory database in particular. Currently, relational database management systems create a new journal file when the old journal file is full or, in database terminology, cause a journal switch when a current journal file reaches a threshold size. This approach is inadequate in a massively parallel database environment because disk access is relatively expensive. Similarly, just not using or maintaining a journal is unreasonable, as all nodes storing a portion of the database may benefit from each having a record of transactions that affect the database portion stored on that node.