1. Field of the Invention
The present invention relates generally to data processing environments and, more particularly, to a system providing methodology for replication subscription resolution.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Addison Wesley, 2000.
Increasingly, businesses run mission-critical systems which store information on database management systems. Each day more and more users base their business operations on mission-critical systems which store information on server-based database systems, such as Sybase® Adaptive Server® Enterprise (available from Sybase, Inc. of Dublin, Calif.). As a result, the operations of the business are dependent upon the availability of data stored in their databases. Because of the mission-critical nature of these systems, users of these systems need to protect themselves against loss of the data due to software or hardware problems, disasters such as floods, earthquakes, or electrical power loss, or temporary unavailability of systems resulting from the need to perform system maintenance.
One well-known approach that is used to guard against loss of critical business data maintained in a given database (the “primary database”) is to maintain one or more standby or replicate databases. A replicate database is a duplicate or mirror copy of the primary database (or a subset of the primary database) that is maintained either locally at the same site as the primary database, or remotely at a different location than the primary database. The availability of a replicate copy of the primary database enables a user (e.g., a corporation or other business) to reconstruct a copy of the database in the event of the loss, destruction, or unavailability of the primary database.
Replicate database(s) are also used to facilitate access and use of data maintained in the primary database (e.g., for decision support and other such purposes). For instance, a primary database may support a sales application and contain information regarding a company's sales transactions with its customers. The company may replicate data from the primary database to one or more replicate databases to enable users to analyze and use this data for other purposes (e.g., decision support purposes) without interfering with or increasing the workload on the primary database. The data that is replicated (or copied) to a replicate database may include all of the data of the primary database such that the replicate database is a mirror image of the primary database. Alternatively, only a subset of the data may be replicated to a given replicate database (e.g., because only a subset of the data is of interest in a particular application).
In recent years, the use of replication technologies has been increasing as users have discovered new ways of using copies of all sorts of data. Various different types of systems ranging from electronic mail systems and document management systems to data warehouse and decision support systems rely on replication technologies for providing broader access to data. Over the years, database replication technologies have also become available in vendor products ranging from simple desktop replication (e.g., between two personal computers) to high-capacity, multi-site backup systems.
Database replication technologies comprise a mechanism or tool for replicating (duplicating) data from a primary source or “publisher” (e.g., a primary database) to one or more “subscribers” (e.g., replicate databases). The data may also be transformed during this process of replication (e.g., into a format consistent with that of a replicate database). In the following discussion, the source(s) from which data is being replicated will be referred to as the “primary database” or “publisher”. A recipient which is receiving data replicated from the source (i.e., from the primary database) is referred to herein as a “replicate database” or “subscriber”.
In many cases, a primary database may publish (i.e., replicate) items of data to a number of different subscribers. Also, in many cases, each of these subscribers is only interested in receiving a subset of the data maintained by the primary database. Each subscriber may have an extensive list of items or types of data that are of interest (i.e., the items that the replicate database wants to receive from the primary database). Different subscribers may have different lists and may want different information from the tables of the primary database. In this type of environment, each of the subscribers specifies particular types or items of data (“subscribed items”) that the subscriber wants to receive from the primary database.
A problem in this type of environment involving a number of subscribers (i.e., replicate databases) wanting access to different types of data from the primary database is how to efficiently determine which subscribers should receive particular items of data generated by the primary database. In other words, given a particular item of data (“published item”) and a set of subscribers with a large number of subscribed items, one needs to be able to quickly and accurately find all of the subscriber(s) that are interested in receiving the published item. A particular complication is that the published item(s) that are published by the primary database are not known in advance.
One approach to address this problem is to compare each published item with the list of subscribed items of each of the subscribers. However, this approach requires a linear search time proportional to the number of subscribed items. For instance, if there are M subscribers, the search requires an order of M*N time (where N is the number of subscribed items). This is inefficient in environments involving a large number of subscribers and subscribed items.
An alternative approach is to build indexes on published or subscribed items to increase the efficiency of the search process. Historically, database indexes have been built on published or subscribed items in an attempt to achieve improved search times. However, a problem with traditional indexing is that since the published items are not known in advance, implementing the traditional approach to indexing is not feasible as it is not possible to pre-index all published items. A database index cannot be built for items that are not yet known.
Another issue is in enabling a subscriber to define the items that are of interest (i.e., those to be replicated from the primary database) using wildcards or negations. For example, a subscriber may wish to use a negation to indicate that the subscriber is interested in all items from the primary database other than a particular type of item. Alternatively, the subscriber may wish to use a wildcard to request all items of a certain type. For example, a subscriber may want to receive a copy of data in all tables owned by a particular “owner”. Prior art replication systems do not have efficient methods for dealing with negations and searching for wildcards. The disadvantages with the traditional approach of comparing the published item with lists of subscribed items include that the traditional approach is inefficient, slow, and does not effectively handle negations and wildcards in lists of subscribed items.
What is needed is a solution which more efficiently resolves, given a particular item of data published by a primary database, the particular subscribers (i.e., replicate databases) to which the item of data should be replicated. The solution should be able to handle published items that are not known in advance. Ideally, the solution should enable subscribers to define the items of data they are to receive using wildcards and negations. The present invention provides a solution for these and other needs.