The present invention relates to a method and system for minimizing synchronization efforts of parallel database systems.
Parallel database systems need to internally synchronize the read and write accesses from different database servers to the same data to ensure that they always work on the same data. The clients are expecting the parallel database system to provide a single truth of the data as if it were a single database server.
One way of achieving this single truth is disclosed in US 20080046400A1 titled “Apparatus and method of optimizing database clustering with zero transaction loss”, incorporated herein by reference. The document discloses a database cluster system that uses multiple stand-alone database servers with independent datasets to deliver higher processing speed and higher service availability at the same time with zero transaction losses. The independent data sets are used to mimic a single database server. This requires constant synchronization. The synchronization of the database servers requires additional processing resources.
Other parallel database systems are made up of a number of database servers each having access to the same data source. In these parallel database systems every piece of data, such as a table, exists exactly once and is stored on the data source. Clients are able to access the data on the data source via one of the database servers. The servers use the data source to read, store and overwrite information necessary for their operation. Problems do occur when two clients try to access the same data on the same data source at the same time. If for example, two clients are updating information in the same table at the same time, then one of those updates could potentially be lost. Another problem may occur when one client reads information via one server and another client updates the same record through another server. The result could be that the client reading the record would not see the latest version of the record.
Therefore, if two or more database servers of a parallel system have access to the same data source, a situation of contention occurs. Efforts have to be taken for the database servers to ensure that they do not destroy the data changes of each other. One way to address the issue is to use a set of rules governing the operation of the servers. For example, one server that is accessing a certain set of data may put that set of data on global lock, which is respected by all database servers. Whilst on global lock, another server cannot have access to said set of data. The other server has to wait for this global lock to be released. Moreover, if one server changed the data and the other server buffered this data in its own buffer pool in memory, it needs to refresh this data to reflect the change. Only then can it access and read the data. This lock method ensures that no data will be lost and that the accessible data is always up to date.
A drawback of measures, such as the lock method, is an inevitable time delay for one server. As the required data is on lock, it is impossible for the server to access the data, resulting in the respective client requiring the data to be on hold. Therefore, the wait time of one database server to access a record that is currently held by another database server would be the result of contention. A further problem is the necessary communication of the different servers concerning the accessibility of the data set.
A further example of implementing a database system with one source is disclosed in the document U.S. Pat. No. 7,051,065 B2, titled “Method and system for performing fault-tolerant online validation of service requests”, incorporated herein by reference. Disclosed therein is a method and distributed computing system for validation of service requests. The method includes determining for first and a second processes that requests for service have not been previously validated. This document primarily deals with a solution to the online validation problem, which aims at ensuring that a given service request is processed by a single server entity only.
The drawback of the current state of the art is that the synchronization of the database servers requires too much additional processing resources and can result in increased response times as perceived by the client. The amount of necessary communication between the servers and the data source concerning the accessibility of data is excessive.