Apparatus for persistently storing data such as a database management system (DBMS) can be a facility for storing large volumes of data and allowing multiple users to access and manipulate the data in an efficient and controlled fashion. Databases are traditionally considered as a large collection of (mainly disk-resident) shared data, managed and accessed by the DBMS.
Thus a database management system (DBMS) can, for example, be an entity, which comprises one or more databases and/or data management systems, whereby the system is responsible for reading the data structures contained in the database and/or data management systems and for changing these data structures.
A database can be a searchable and updateable information storage structure, which comprises one or more data objects, and the use of which is controlled by the DBMS. The structure may for example comprise index trees and data pages that may be accessed through the index trees.
A data object or data item may, on one hand, be an information structure, which can comprise other data objects or such data objects, which can be construed as atomary data objects. For instance, in a relational database data objects represent tables comprising rows. The rows comprise columns, which are typically atomary data objects. A tuple can mean the data item that may contain other objects as elements, e.g. a tuple may be one row containing single customer's data in a “customer” table of a database.
A database operation can be an event, during which data objects of the database are read from the database, during which data objects are modified, during which data objects are removed from the database, and/or during which data objects are added to the database. A set of database operations acting on the data objects is called a transaction. The transaction may comprise one or multiple operations. The transaction can also comprise other transactions.
A page in database environment can, for example mean a collection of data objects. A page may contain zero, one or multiple data objects. At maximum, the page may contain all data objects of the storage.
A database table can be a collection of zero or more data objects referred to as table rows, each having one or multiple data items referred to as column values. A checkpoint is a process where altered pages are written from one storage unit, such as RAM to another storage unit such as Disk. Typically, the end-result of a checkpoint is a snapshot of a database on the disk.
A transaction log can be a sequential storage of transaction data committed in the database. The transaction log is typically used for recovering committed data after re-starting a database management system after abnormal shutdown.
Two or more datababase management systems can be arranged to exchange data with each other. One such mechanism is data synchronization, where data is transferred from one database management system to at least one other database management system to make the data of the databases at least partially similar upon completion of the data transfer.
Database management systems should be optimized for two different aspects of performance. On one hand, the DBMS should be able to provide a relatively fast access to the data of the database. On the other hand, the server should be able to provide relatively high performance for writing data to a persistent storage. In addition to performance optimisation, the DBMS should possibly be able to recover data from different kinds of error situations, such as abnormal shutdown of the DBMS.
Such a known distributed database management system is shown in FIG. 1, to provide one known solution for this optimization challenge of the DBMS 20, where upon data synchronization, data from a first DBMS 10 is transferred to a second DBMS through a network connection and the second DBMS writtes the data into two different storage files in the second DBMS: database file(s) 22 and transaction log file(s) 23. To provide relatively quick access to data, the data is typically organized in the database files 22 in a tree-structure or some other, preferably searchable data structure. The most recently accessed and/or modified parts (database pages) of the database are kept in the cache memory 21 of the DBMS 20. When data item is inserted into the database, its location in the storage structure is determined e.g. by the key value of the data item. For this reason, data written to a database in a single transaction may reside in any number of different locations inside the database storage structure. On the other hand, a database log file, i.e. the transaction log file 23, contains the data of the committed transactions in a sequential order.
FIG. 1b shows another known distributed database management system. In this example, the data synchronization between the DBMSs is implemented in a separate application software process 16 that maintains a network connection to both the first and the second DBMS. To synchronize data between the DBMSs, the application reads data from one DBMS and writes it to the second DBMS.
FIG. 2 shows a transaction 100 containing data items 101-105 read from the first DBMS needed to synchronize some data between the first and second DBMS, a database storage structure 200 containing database pages 201-207 and transaction log file 300 containing pages 301-307.
FIG. 3 shows a known situation where during data synchronization the transaction 100 from the first DMBS is committed and thus made persistent on the storage device. The data elements 101-105 of the transaction should be written to the persistent storage media such as hard disk as efficiently and quickly as possible. In known cases, this means that the data must be written sequentially into the pages 301-307 of the transaction log file 300 as data values 101″-105″. The data is also written into the cached pages 202-207 of database storage structure 200 as data values values 101′-105′ but those pages may not be written to the database files 22 of a storage device until later. In this known system, high degree of efficiency is achieved because disk storage devices are much more efficient when writing data into sequential sectors on the disk instead of writing data into more random locations on the disk, e.g. to a tree-structure. The cached data of the database storage structure 200 is made persistent later at a checkpoint. A checkpoint can mean a routine in the DBMS software that writes all “dirty” pages, i.e. pages of the database storage structure that have been modified in-memory, to the persistent storage. Checkpoints are basically the occasions when data is written from the cache memory of the DBMS to the database files. However, some dirty database pages may be written to the disk also outside of a checkpoint when cache becomes too full of these dirty pages.
The FIG. 4 explains the data storing during data synchronization, e.g. a write transaction execution, method 400 known from the prior art. Some data read from the first DBMS 10 is received at the second DBMS and this begins a new write transaction in the step 401. The second DBMS receives from the first DBMS 10 or from a data synchronization application that reads data from the first DBMS a request to add a row to the database or delete a row from the database in the step 402. The server adds the data of the row to the transaction log in the step 403. If the operation was the last one of the transaction in the conditional step 404, the transaction is committed which means that the data of the new data of the transaction log is written to the transaction log file 23 in the step 405. At some point, the server may determine that a checkpoint is needed in the conditional step 406. The checkpoint process writes all the dirty pages of the cached part of the database storage structure to the database file(s) in the step 407. The server is now ready to continue executing transactions in the step 408, or terminate the transaction processing related to data synchronization in the step 409.
Transaction log files can be used for recovery from abnormal shutdowns. If a DBMS process is terminated abruptly, not all data of the database may be found in the database file(s). The DBMS guarantees that the database file(s) contain(s) all those transactions that were committed in the database when the most recent checkpoint was done. However, the DBMS also guarantees that the data of the transactions committed to the database since the checkpoint is stored persistently in the transaction log files. To recover from an abnormal shutdown of the database, the DBMS applies to the database all those committed transactions that are newer than the most recent checkpoint from the transaction log files. This is called a roll-forward recovery process and it brings the database back to the state where it was before the abnormal shutdown. FIG. 5 illustrates the roll-forward recovery process 500 as known in prior art. The process reads 501 a transaction's data from the transaction log, applies 502 the data into the database storage structure, checks 503 if there are additional transactions in the transaction log and completes 504 the recovery process when all transactions have been processed.
The down side of the known dual storage method and system for persistently storing data during data synchronization is that it may cause some overhead especially if the volume of the data of a transaction is large. In traditional DBMS synchronization systems, data volume of a single transaction has usually been in a range of few hundred bytes or at maximum few kilo-bytes. That small volume of data is written immediately to the transaction log file to guarantee persistence. The database files may be updated much later, along with data accumulated from other committed transactions. This way, server may optimize its performance by postponing the update of the database files to a point of time when the operation can be done more efficiently as there is more data that can be written to the database files at the same time.
Thus in the known data synchronization method and system, the data is written to the storage files of the second DBMS twice or more. This overhead has been justifiable by the performance gains achieved by the known method and system. However, the emergence of larger volume data as a data type, e.g. media data, managed by a DBMS and synchronized between DBMSs has made the overhead problem more significant in the data synchronization process. Transaction log is still needed for data recovery purposes but the performance penalty of using the logging when synchronizing transactions containing large volumes of data has grown high. Writing small volume of data twice to the disk storage does not cause a significant overhead especially because the second write may write data of multiple transactions to the disk at the same time. On the other hand, writing several megabytes or even gigabytes of data in one transaction causes a significant overhead, as the storage device must write a large amount of data twice to the disk concurrently. This happens because the large volume of data of the transaction does not fit in the cache memory of the DBMS and the data must thus be written immediately to the database files to prevent the cache from becoming full. This very often means that maximum synchronization performance of the DBMS that manages larger data volumes is severely degraded.
As it is important to guarantee persistence of the data of all committed transactions while synchronizing data between DBMSs, it is important to improve the data storage method and,system so that the performance overhead of using the DBMS for synchronizing transactions containing larger volumes of data stays at a tolerable level.