The invention relates to database management systems. In particular, the invention relates to an in-memory database management system (DBMS) and checkpointing an in-memory database using direct memory references between indexes and logically clustered tuples.
A database management system (DBMS) is an application for storing large volumes of data and allowing multiple users to access and manipulate the data in an efficient and controlled fashion. Databases are traditionally considered as a large collection of mainly disk resident shared data, managed and accessed by the DBMS.
Another type of database management system is an in-memory database (IMDB) management system or a main memory database system. This type of database management system comprises random access memory (RAM) as its main working memory for storing data and disk storage for backing up the data held in memory. Compared to the disk based database management systems, in-memory database systems offer superior performance by offering shorter access times. Storage disks are block-oriented meaning that reading and writing a relatively large amount of data has the same high cost as reading or writing a single byte.
In-memory database systems use a technique called ‘checkpointing’ in order to reduce the recovery time of the database in the presence of a failure of the database.
The purpose of checkpointing is to provide a snapshot of the data within the database. A checkpoint, in general, is any identifier or other reference that identifies, at a point in time, the state of the database. Modifications to database pages are performed in memory and are not necessarily written to disk after every update. Therefore, periodically, the database system must perform a checkpoint to write theses updates which are held in-memory to the storage disk. Writing these updates to storage disk creates a point in time in which the database system can apply changes contained in a transaction log during recovery after an unexpected shut down or crash of the database system. If a checkpoint is interrupted and a recovery is required, then the database system must start recovery from a previous successful checkpoint.
Checkpointing can be either transaction-consistent or non-transaction-consistent (called also fuzzy checkpointing). Transaction-consistent checkpointing produces a persistent database image that is sufficient to recover the database to the state that was externally perceived at the moment of starting the checkpointing. A non-transaction-consistent checkpointing results in a persistent database image that is insufficient to perform a recovery of the database state. To perform the database recovery when using non-transaction-consistent checkpointing, additional information is needed, typically contained in transaction logs.
Transaction consistent checkpointing refers to a consistent database, which doesn't necessarily include all the latest committed transactions, but all modifications made by transactions, that were committed at the time checkpoint creation was started, are fully present. A non-consistent transaction refers to a checkpoint which is not necessarily a consistent database, and can't be recovered to one without all log records generated for open transactions included in the checkpoint.
Depending on the type of database management system implemented, a checkpoint may incorporate indexes or storage pages (user data), indexes, and storage pages. If no indexes are incorporated into the checkpoint, indexes must be created when the database is restored from the checkpoint image.
Storage pages are a collection of database rows called ‘tuples’. Tuples are ordered by the primary key, grouped to logical storage pages, and pointed to by index entries by using direct pointers. During a database checkpoint, storage pages including one or more modified tuples are copied to a ‘checkpoint buffer’, which has a size that is a multiple of page size, for disk write operations.
If indexes were included in the checkpoint, the direct pointers (direct memory references) would become invalid if they were copied to a checkpoint image as such, because they would point to memory segments in volatile memory, i.e. RAM, which are lost when a database process terminates.
In order to maintain the validity of the pointers of the index structures in a checkpoint image, the pointers would have to be updated to refer to the corresponding memory segments in the checkpoint image instead of referring to those in volatile memory, and this would have to be done before checkpointing the index itself. This would require updating both internal pointers of indexes, and pointers from every index referring to every tuple included in the checkpoint image.
Therefore, checkpointing indexes in databases which use direct tuple pointers is a very expensive operation because tuples are often referred to by multiple indexes.
Therefore, many prior art in-memory database management systems are forced to use indirect pointers between index entries and tuples.
Another solution is not to include indexes in the checkpoint image at all but to recreate the indexes as part of the restore process by extracting key values from tuples as they are read from the checkpoint image, and inserting all key values to corresponding indexes.
U.S. Pat. No. 7,587,429 discloses a page consistent checkpointing method for a computer system, and involves altering data objects of pages in primary storage, identifying pending data objects, and altering pending data objects after writing data objects into secondary storage. However, this disclosure does not address the problems discussed above.