Today, companies are providing services (“online services”) over a variety of networks, such as the Internet, with increasing frequency. Illustrative examples of an online service include (a) an online service that allows customers to purchase products through a corporate web site, and (b) an online service that allows customers to receive a service, such as insurance or a newspaper subscription, by registering through an electronic interface, such as a user interface displayed on a cell phone.
Online services are typically implemented using a series of one or more computer systems (collectively referred to herein as a “service system”) that store and retrieve data from more than one data repository. The data repositories used by such systems may take many forms, such as databases or file servers. There are many reasons why a service system may comprise more than one data repository. One reason is that typically, a service system is composed of multiple sub-systems, and each sub-system may use its own data repository to store data. For example, a service system may have a billing sub-system that stores bill information in one data repository, and a customer management sub-system that that stores customer information in another data repository. Each sub-system of a service system may need to store its own copy of a particular set of data. As a result, multiple copies of the same information may be stored and maintained throughout different sub-systems of a service system.
Another reason why service systems may use more than one data repository is to ensure that a service system implementing an online service can scale to support a large volume of users. By using multiple data repositories, the service system may store certain types of data close to where the data is likely to be requested to minimize the access time to that data. For example, the service system may store a first type of data in a location where the first type of data is used most frequently, while a second type of data may be stored in another location where the second type of data is used most frequently.
Still another reason why service systems may store data in more than one data repository involves how service systems are integrated together. It is common to integrate one computer system (such as a legacy system) into another computer system as the needs of the online service change. Different service systems, prior to integration, may each be comprised of a data repository storing the same or similar type of data, e.g., each service system may have a billing system or have a data repository that stores customer information. When one individual service system is integrated into another service system, the resulting service system may use all of the data repositories previously used by each individual service system prior to integration without combining all of the data repositories into a single data repository. Thus, after integration, the resulting service system may employ multiple data sources that each represent data in different ways, e.g., the resulting service system may now employ a database and a file server or two databases that each store data using dramatically different schemas.
When a service system employing multiple data repositories is deployed in the real world, it is commonplace for inconsistencies to be introduced in the data stored in the multiple data repositories. An inconsistency between two data repositories refers to the condition when the state of data stored in a first data repository does not accurately reflect or is not synchronized with data stored in a different data repository.
To illustrate how such an inconsistency may be introduced into a service system, consider an illustrative service system used by an online service that allows customers to register for cable television service through a web site. The system employs numerous data repositories, including a first data repository that stores billing information and a second data repository that stores service information. Due to an error (“or bug”) in the system, after a user registers with the system, data for that user is successfully stored in the first data repository, but not the second data repository. As a result, the system is able to process the bill for the user, so the user will be billed for the service. However, the system will not be able to provide the service to the user, as data stored in the second repository for that user cannot be processed since the data was not stored correctly. As a result of the inconsistency between the user's data between the first data repository and the second data repository, the user will be billed for a service that the user will not receive.
Such inconsistencies between the data repository employed by a system lead to frustration and expense to both the users of the system (for example, due to unsatisfactory quality of service or non-performance of service) and the operators of the system (for example, in lost revenue and the resources allocated by the operators to resolve such inconsistencies).
Currently, operators of a system typically employ IT administrators or specialized technologists (referred to collectively as “the inconsistency resolvers”) to manually diagnose inconsistencies between the plurality of data repositories employed by the system. The inconsistency resolvers typically write and execute specialized software programs to identify a particular inconsistency between the plurality of data repositories employed by the system. After an inconsistency resolver verifies the existence of a particular inconsistency using the specialized software programs, the inconsistency resolver typically writes and executes an additional set of specialized software programs to fix the identified inconsistency.
This approach, unfortunately, requires a large amount of time and effort for the inconsistency resolvers to identify and fix an inconsistency. In addition, the inconsistency resolvers need a high degree of technical proficiency to write and execute the specialized software programs required to identify and fix the inconsistencies. Further, the specialized software code used by the inconsistency resolvers only identifies or resolves the particular inconsistencies that the code is configured to correct. As a result, very often the specialized software code only resolves a small portion of the inconsistencies present between the data repositories used by the system. Many unidentified inconsistencies remain unresolved and undetected in the system.
Consequently, an approach for resolving the inconsistencies between multiple data repositories that avoids the disadvantages of the above-mentioned approaches is desirable. The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.