1. Field of the Invention
The present invention relates generally to distributed databases and, more particularly, to synchronizing database information over a distributed communications network.
2. Related Art
As the use of computer networks grows, the use of distributed databases for storing data has become commonplace. Distributed databases are databases wherein one or more portions of the database are divided and/or replicated (copied) to different computer systems. These portions of the database are commonly referred to in the art as xe2x80x9cpartitions.xe2x80x9d The act of segmenting a database into partitions is commonly referred to as xe2x80x9cpartitioning.xe2x80x9d Partitions are generally distributed throughout a communication network among separate distributed systems to allow access to the database in a more efficient manner than can be achieved with a centralized database. Partitions may be made more easily accessible to users that access information within the particular partition to increase overall database performance. For example, partitions may be located geographically and/or logically closer to database users that use a particular partition.
Also, a distributed database system may include replicas, or copies of a database or database partition, which are located on different systems. The term xe2x80x9cdatabase replicaxe2x80x9d or xe2x80x9creplicaxe2x80x9d will be used herein to refer generally to such replicas regardless of whether the replica contains an entire database or a partition thereof. The set of all database replicas for a partition are referred to as a replica set. By having multiple copies of the database, a database can be recovered if one system (and one copy) experiences problems. Also, distributed databases allow the data to be managed by multiple servers, thus increasing database throughput. However, there are drawbacks to conventional techniques to maintain distributed databases. One problem includes synchronizing database information among the distributed database or database partitions. If a database replica is unreachable, there will be no convergence in the database data among the partitions.
Convergence is generally experienced when each of the database replicas contains the same information. As users add information to the individual replicas, database convergence is critical; a user must know that the data upon which they are relying is current and agrees with the data of the other database replicas. In some distributed database systems, there is an Nxc3x97N connectivity problem that impedes database convergence. In an Nxc3x97N system, each system must be able to contact all other systems directly. For instance, each system having a database replica that can achieve a local database change must be able to contact all other replicas to provide those replicas with that change.
In a large database system, there are several reasons why connectivity between database servers cannot be guaranteed. There may be economic limitations to complete connectivity. Transmission of database information over expensive communications links may be cost-prohibitive. Further, there may be transport limitations. In particular, it may not be possible for all database servers to communicate in a same communication protocol across the network. In some situations, two or more computer systems may not be able to communicate at all. There may also be security limitations. With the advent of firewalls and secure networks, allowing such systems to communicate may not be desired. Lastly, some systems may not be available because of a network failure, disconnection from the communication network, or may not be operable or powered on during a time when synchronization is needed.
In addition to distributing database information, synchronization operations may need to be performed to complete certain operations, such as a partitioning operation. In one such partitioning operation, one system, referred to as the xe2x80x9cmasterxe2x80x9d or primary system, is designated as the master for a database partition (termed the xe2x80x9cmaster replicaxe2x80x9d) is responsible for controlling the partitioning operation. The master system generally stores the master replica in a memory system located on the master.
Typically, a database administrator will control the master to create or modify a partition. To propagate changes to the partition, the master typically requires contact with all systems that will participate in a partitioning operation before the partitioning operation can be completed. If one or more systems are not reachable by the master, convergence cannot be achieved. Also, in some database systems it may not be acceptable to relay changes at one point in time to one subset of systems and to relay the changes at a second point in time to a second subset of systems.
Recently, computer systems that provide directory services have become a common way for providing resource information to users. Typically, directory services are databases of resource information that can be accessed by users through a communications network. Novell Directory Services (NDS), for example, is a global directory service solution that provides globally distributed information regarding various network resources to various network systems (Novell Directory Services is a registered trademark of Novell, Incorporated). Such resources can include objects such as systems, users, applications, servers, etc., that users may access through the NDS directory service.
Because the NDS database is used to access all resources on the network, the entire network would be disabled if the database itself were stored on only one server (with all other servers accessing the database on that server) and that server were to become unavailable. To avoid single point failures, distributed NDS databases are typically implemented. In this distributed database, replicas of the database are created and those replicas are stored on different servers. Then, if one server malfunctions, all other servers can continue to access the NDS database from another database replica.
If the NDS database is too large, a network administrator may not want to store the entire database on multiple servers. In this case, the network administrator may create directory partitions. The partitions may also include subpartitions (xe2x80x9cchild partitionsxe2x80x9d) beneath them. Using partitions can improve network performance, especially if the network expands across low bandwidth links, such as in a wide area network (WAN). Partitions also make it easier for a network administrator to separately manage portions of the database.
The NDS directory service is based on a standard referred to as the X.500 directory services standard, which defines a directory services protocol. Lightweight Directory Access Protocol (LDAP) is another type of directory services database standard which is commonly used to communicate and store resource information over the Internet. Because directory services databases may benefit from replicas and partitioning, they may also suffer from the aforementioned synchronization problems. These problems may cause other problems for systems that rely upon directory services information.
The present invention overcomes these and other drawbacks of conventional systems by providing a system and method for synchronizing distributed databases that does not require connectivity between all database replicas. The present invention enables each server to track the state of each replica of a replica set. Changes to the replicas are then communicated between the servers along with their states. The states may be stored as an array of timestamps, each such timestamp indicating a time at which the replica on each server was last updated. In one embodiment, the timestamp may be a unique identifier for identifying a replica change performed on a particular replica.
In a network wherein two servers (a first and a third server) cannot communicate directly, the first network server transmits a replica change to an intermediate (second) server including state information of the first network server, the second server transmits the change to the third server, and the third server updates its replica. The third server transmits its state information to the second server, and the state information is transmitted to the first server. Thus, the first server, by receiving and inspecting the third server""s state information, can determine that the change in replica information was performed on the third server.
In one aspect of the present invention, a system is provided for synchronizing replicas of a distributed database among a plurality of servers. The system includes means for storing, at a first server, a plurality of timestamps identifying a state of a plurality of replicas each located on one of the plurality of servers. Further, the system include means, responsive to a change in a local replica at the first server, for transmitting the replica change to a second server, wherein transmission of the change is responsive to a timestamp of the replica of the first server and a time stamp of the replica of the second server. The system further includes means for updating, at the second server, the replica of the second server to reflect the change and means for storing, at the second server, a new timestamp indicating a time at which the replica of the second server was last updated.
In another aspect, a method for synchronizing replicas of a distributed database is provided. The replicas forming a replica set, wherein each replica of the replica set is stored on one of a plurality of servers in a network. The method comprises steps of storing, at a first server, a plurality of timestamps associated with a plurality of replicas located on each of the plurality of servers responsive to a change in a local replica at the first server, the change is transmitted to a second server, wherein transmission of the change depends upon a comparison of a timestamp of the replica of the first server and a timestamp of the replica of the second server. The method further comprises a step of updating, at the second server, the replica of the second server to reflect the change. The method further comprises a step of storing, at the second server, a new timestamp indicating a time at which the replica of the second server was last updated.
In another embodiment, the method further comprises steps of, responsive to the change in the replica of the second server, transmitting the change to a third server, updating, at the third server, the replica of the third server to reflect the change and notifying the first server that the replica of the third server is updated to reflect the change. In another embodiment, the change is initiated by a user on the replica of the first server. In another embodiment, the change is initiated by an update received from another server.
In yet another embodiment, the first server stores the plurality of timestamps as a single value. In another embodiment, the method further comprises a step of storing, at the second server, a plurality of timestamps associated with the plurality of replicas, wherein transmitting the change to the third server depends upon a comparison of the new timestamp of the replica of the second server and a timestamp of the replica of the third server.
In another embodiment, the method includes storing a plurality of timestamps comprises storing the timestamps as a first array of timestamps. In another embodiment, the method further comprises transmitting the first array of timestamps to the second server. In another embodiment, the method further comprises storing timestamps on the second server in a second array and merging the first array with the second.
In another embodiment, the step of notifying includes determining at the first server, a timestamp associated with the third replica received from the second server, the timestamp from the second server indicating that the third server has incorporated the change.
In still a further aspect of the present invention, a computer program product is disclosed. The product comprises a computer readable medium having computer program logic recorded thereon for enabling a processor in a computer system to synchronize replicas of a distributed database. The computer program product is adapted to cause the computer system to perform the steps of storing, at a first server, a plurality of timestamps associated with a plurality of replicas located on each of the plurality of servers, responsive to a change in a local replica at the first server, transmitting said change to a second server, wherein transmission of the change depends upon a comparison of a timestamp of the replica of the first server and a timestamp of the replica of the second server, and updating, at the second server, the replica of the second server to reflect the change. The computer system performs a step of storing, at the second server, a new timestamp indicating a time at which the replica of the second server was last updated.
In another embodiment, the computer system performs the steps of, responsive to the change in the replica of the second server, transmitting the change to a third server, updating, at the third server, the replica of the third server to reflect the change and notifying the first server that the replica of the third server is updated to reflect the change. In another embodiment, the change is initiated by a user on the replica of the first server. In another embodiment, the change is initiated by an update received from another server.
In yet another embodiment, the first server stores the plurality of timestamps as a single value. In another aspect, the computer system performs a step of storing, at the second server, a plurality of timestamps associated with the plurality of replicas, wherein transmitting the change to the third server depends upon a comparison of the new timestamp of the replica of the second server and a timestamp of the replica of the third server.
In another embodiment, the computer system performs a step of storing a plurality of timestamps comprises storing the timestamps as a first array of timestamps. In another embodiment, the computer system performs a step of transmitting the first array of timestamps to the second server. In another embodiment, the computer system performs a step of storing timestamps on the second server in a second array and merging the first array with the second.
In another embodiment, the step of notifying includes determining at the first server, a timestamp associated with the third replica received from the second server, the timestamp from the second server indicating that the third server has incorporated the change.
In another aspect, an apparatus is provided for synchronizing a distributed database. The apparatus comprises a processor and a memory system configured to store a state of a first copy of a database. The apparatus further comprises a synchronization system configured to transmit a change in the first copy of the database and the state of the first database to an intermediate entity having a second copy of the database. The synchronization system is responsive to a message transmitted from the intermediate entity, the message indicating that the change has been incorporated in a third copy of the database by a third entity.
Advantageously, the present invention does not require communication between each of the servers to provide for convergence of the distributed database. Further, the aforementioned synchronization system and method may be used to perform changes to the database system such as during a partitioning operation. Because other servers participate in transferring changes to other servers, burden on the master server to perform partitioning operations is reduced. Such operations may include merging, deleting, creating, and editing partitions.
Further features and advantages of the present invention as well as the structure and operation of various embodiments of the present invention are described in detail below with reference to the accompanying drawings. In the drawings, like reference numerals indicate like or functionally similar elements. Additionally, the left-most one or two digits of a reference numeral identifies the drawing in which the reference numeral first appears.