1. Field of the Invention
The present invention relates generally to data processing environments and, more particularly, to a system providing methodology for data replication resynchronization.
2. Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Addison Wesley, 2000.
Increasingly, businesses run mission-critical systems which store information on database management systems. Each day more and more users base their business operations on mission-critical systems which store information on server-based database systems, such as Sybase® Adaptive Server® Enterprise (ASE) (available from Sybase, Inc. of Dublin, Calif.). As a result, the operations of the business are dependent upon the availability of data stored in their databases. Because of the mission-critical nature of these systems, users of these systems need to protect themselves against loss of the data due to software or hardware problems, disasters such as floods, earthquakes, or electrical power loss, or temporary unavailability of systems resulting from the need to perform system maintenance.
One well-known approach that is used to guard against loss of critical business data maintained in a given database (the “primary database”) is to maintain one or more standby or replicate databases. A replicate database is a duplicate or mirror copy of the primary database (or a subset of the primary database) that is maintained either locally at the same site as the primary database, or remotely at a different location than the primary database. The availability of a replicate copy of the primary database enables a user (e.g., a corporation or other business) to work with a copy of the database in the event of the loss, destruction, or unavailability of the primary database.
Replicate database(s) are also used to facilitate access and use of data maintained in the primary database (e.g., for decision support and other such purposes). For instance, a primary database may support a sales application and contain information regarding a company's sales transactions with its customers. The company may replicate data from the primary database to one or more replicate databases to enable users to analyze and use this data for other purposes (e.g., decision support purposes) without interfering with or increasing the workload on the primary database. The data that is replicated (or copied) to a replicate database may include all of the data of the primary database such that the replicate database is a mirror image of the primary database. Alternatively, only a subset of the data may be replicated to a given replicate database (e.g., because only a subset of the data is of interest in a particular application).
In recent years, the use of replication technologies has been increasing as users have discovered new ways of using copies of all sorts of data. Various different types of systems, ranging from electronic mail systems and document management systems to data warehouse and decision support systems, rely on replication technologies for providing broader access to data. Over the years, database replication technologies have also become available in vendor products ranging from simple desktop replication (e.g., between two personal computers) to high-capacity, multi-site backup systems.
Database replication technologies comprise a mechanism or tool for replicating (duplicating) data from a primary source or “publisher” (e.g., a primary database) to one or more “subscribers” (e.g., replicate databases). The data may also be transformed during this process of replication (e.g., into a format consistent with that of a replicate database).
In certain circumstances, it may happen that a replicate database no longer represents the contents of the primary database, due to some corruption or contamination of the replicate database, so that a desire exists to repopulate the contents of the replicate databases from the primary, and subsequently continue replication. Such resynchronization may be desired in other situations, as well, e.g., when replication latency builds past tolerable limits, including due to poor replication performance, or due to some period of time when replication was disabled or inactive.
Regardless of how the desire to resynchronize occurs, resynchronization includes a suspension of replication, re-population of the replicate database, and resumption of replication from that point. Unfortunately, currently when needing to resynchronize a replicate database, either the replication environment needs to be rebuilt to behave like first time materialization, or an individual manual process has to be devised. This issue becomes more complex when there is an inability to suspend activity on a primary database to provide a clean delineation between which transactions are contained in a database dump and which transactions are not. For example, when a primary database is used in a production environment, and business applications, e.g., financial trading, continuously generate large amounts of data per second, it becomes practically impossible to suspend the database.
Accordingly, a need exists for a manner of resynchronization that occurs with minimal interruption to the primary database environment, and with as little manual intervention within the replication domain as possible. The present invention addresses such a need.