Databases are well known and are used in a variety of applications. For example, a bug tracking database may be used to collect bug and failure data for software or firmware in computers and computer networks. Databases are typically used by different user groups, such as development and quality assurance (QA) teams, which use databases to manage defects, enhancements, and requests for service.
A problem with databases is that users are typically distributed worldwide. There could be hundreds or thousands of users from several different continents accessing a database. Consequently, access to the database from distant geographical locations can be inconveniently slow.
A conventional solution is to create replicas (exact reproductions of a database) at different server sites in order to enable local access to users everywhere in the world. Any transactions (updates to the database) at each server site are replicated at other replicas so that all of the replicas are continually updated. A replication (synchronization) program, which updates all of the replicas, typically involves an export function and import function. Typically, a synchronization packet (hereinafter referred to as a “packet”) that contains metadata for a transaction of a replica is exported from one server site and then imported at destination replicas (the replicas to be updated) located on other respective server sites.
A problem with this conventional solution is that packets often do not get imported at a particular destination replica. The most common reason for such a failure is that one or more packets preceding the current packet has been lost in transit or lost in some other manner. Typically, packets should be received in a particular sequence or they will not be applied. When a packet is lost, the import function of the synchronization process for that destination server site stops and subsequent packets start accumulating in the in-box.
When packets accumulate in the in-box, neither the administrator at the sending server site nor the administrator at the destination server site is notified. The administrator at the sending server site will not notice the accumulated packets since they are at the destination server site. The administrator at the destination server site may notice the accumulated packets only if that administrator manually and routinely checks the server site for accumulated packets. However, that administrator would not be able to take any corrective actions, because a corrective action for this particular problem would involve actions taken at the sending server site. The administrator at the destination server site typically does not have privileges to access the sending server site and thus would not be able to resolve the problem. Accordingly, that administrator has to somehow request help from the administrator at the sending server. This can take some time especially if the administrators are in distant time zones. The administrator at the destination server site may have to resort to email communication, which could span hours. Consequently, the turnaround time for resolution of the problem is adversely affected.
Another problem with the conventional solution is that, after any problems are resolved at the sending server site, a new synchronization cycle has to be manually forced to enable the replication/synchronization process. To accomplish this, the administrator at the sending server site requires information that is available only at the destination server site. A problem is that the administrator at the sending server site typically does not have privileges to access the sending server site and would thus not be able to retrieve the required information. Accordingly, the administrator at the sending server site would need to contact the administrator at the destination server site to get the information. As indicated above, this can take some time due to the administrators being in distant time zones or communicating by email, which could span hours. Consequently, the turnaround time to resolve such a problem is adversely affected.
Another reason that packets accumulate is that at the destination server site, the database user account of the database server has been locked out (at the OS level). A lockout may occur, for example, if the password is periodically changed (e.g. every 90 days) or after a predetermined number of logins (e.g. after 300 logins) and a user attempts to login with an old password. An administrator would then have to unlock the password. Consequently, incoming packets would not get imported at the replica at the destination server site. The administrator at the destination server site would learn of the problem only by manually and routinely checking if incoming packets have accumulated. Then, by manually running import commands, the administrator can determine if the user account is locked out. The administrator would then have to manually unlock the database user account and then manually rerun the import function of the replication program.
A synchronization failure can also originate at the sending server site. For example, a glitch during a previous export may prevent all further exports from occurring. The administrator at the sending server site will learn of such a failure only if that administrator manually and routinely checks the status of the last scheduled export and then manually runs the export function to determine the cause of the failure. The administrator would have to manually resolve the cause of the failure and then manually force the synchronization (i.e. export function).
Accordingly, what is needed is an improved method and system for reliably synchronizing replicas of a database. The method and system should be capable of being easily adapted to existing technology. The present invention addresses such a need.