A database system typically consists of a managed platform and a data repository. The managed platform serves as an interface for the database system and provides two general functions. First, the managed platform deconstructs the data in an individual record according to the schema of the database system so that the data contained in the record can be stored efficiently. Second, the managed platform reconstructs the data entries stored in the database system to provide a complete record to another system or user seeking the record. These functions are controlled by Application Program Interface commands (API's) that are processed by the managed platform. Specifically, API's are high-level commands that encompass low-level access and management operations on the data repository. Often, API's are vendor specific, so low-level access to a data repository without the use of API's directed to the managed platform may not provide valid results.
Enterprises today often contain multiple database systems, each of which may be provided by a different vendor. A management system is often included in an enterprise in order to effectively access any one of the constituent database systems and maintain the integrity of the data entries stored. The management system often includes a data store containing a local replica of at least some of the data stored in one or more database systems. In order to keep the information current, the local replica must be synchronized with data that is contained in the one or more database systems. The data synchronization process is often referred to as reconciliation.
An identity manager is but one example of a management system. Identity management in enterprises is a resource-demanding task. It involves keeping track of tens of thousands of identities across the enterprise and correlating these identities to individual employees, contractors, suppliers, customers, etc. When a change to a record is made in one portion of the enterprise, that change must be propagated through the enterprise in order to ensure that discrepancies are detected and resolved so that the data stored throughout the enterprise is current and accurate. Accordingly, an identity manager is arranged to receive information about changes made to stored data, store relevant information locally, and propagate those changes to other database systems within the enterprise.
There are a number of prior reconciliation strategies that have been developed. In a first approach, the reconciliation process includes a record-by-record comparison between the records stored in a particular database system and the management system. This approach is resource intensive and can be very inefficient when few changes have been made. According to a second approach, a data repository is monitored, and a change to the data in the data repository is considered a triggering event that initiates a synchronization process directly between the data repository and the management system. This approach is inefficient because the data extracted is often incomplete since it has not been extracted using the API's of the managed platform. As such, data provided to the management system requires several transformations to convert the low-level data extracted from the data repository to a form that can be used by the management system at a high-level. A third approach attempts to overcome the problems of the second approach by mapping changes to a data repository directly to a data store within the management system. Changes to the data in the data repository trigger a low-level synchronization process directly between the data store of the management system and the data repository rather than first transforming the data from the data repository to a form that can be used by the management system at a high-level. This approach may not be more efficient than the second approach when the storage schema employed by the data repository and the data store of the management system are different. In such cases, low-level mapping of data between the two may be as complex as the transformation of data from the data repository into a form that can be used by the management system at a high-level. Furthermore, both the second and third approaches are adversely sensitive to reconfiguration and schema changes since they require the direct use of low-level access and management operations.