1. Field of the Invention
This invention relates to technologies and methods for synchronizing two or more directories in a computer data storage system, and especially enterprise directory management tools for managing information in numerous databases and directories in an unified manner.
2. Description of the Related Art
Computing enterprises, whether large or small, comprise numerous directories, network operating systems and databases in which corporate data, client information, and employee data is stored.
In some scenarios, the data to be managed is contained in a homogeneous environment, e.g. the forms and formats of the data are similar or compatible. In such a case, a periodic “synchronization” process is executed which compares the contents of the distributed data objects, and selectively copies or updates all data sources to contain appropriate data.
For example, an email server's message storage format is usually the same as the storage formats on the email client machines. So, when a client machine logs into the email server, the server can quickly determine if there are any “new” messages (e.g. messages in the server's storage which have not been copied to the client's storage), and transmit those messages to the client machine.
Larger scale homogeneous database synchronization is enabled by many distributed database products, such as IBM's Lotus Notes [TM] product.
However, many data sources which contain related or partially related data objects are not homogenous with each other, but rather are heterogeneous in nature. For example, information relating to a corporate employee “John Smith” may be contained in many different data stores within a corporate Intranet. His employee records (hire date, pay scale, home address, dependent names, etc.) may be contained in an Oracle database on an Human Resources server, while his current assignment information may be stored in a departmental or divisional server (department, manager's name, email address, etc.) in a Lotus Notes system.
One available technology for managing data objects in heterogeneous data sources is the Lightweight Directory Access Protocol (“LDAP”), and open industry standard for remotely querying and modifying data objects within an LDAP-enabled directory. This protocol reduces query and change operations to a uniform LDAP operation which can be interpreted by the LDAP-enabled servers in order to make changes to data objects in directories.
LDAP enables a user to locate organizations, individuals, and other resources such as files and devices in a network, whether on the public Internet or on a corporate intranet. LDAP is a minimized version of Directory Access Protocol (DAP), which is part of the X.500 standard for directory services in a network.
Some directories, such as LDAP directories, have support for a change log which records the changes that have been made to the directory. For directories which do not support change logs, users or administrators sometimes develop their own mechanisms for detecting changes in a directory. These techniques usually include polling the directory(ies), identifying any changes which have been made since the last poll operation, and upon detection of a change, report that an entry has been modified, usually listing out all the attributes for the changed entry.
In the following example, a phone number in an entry for a person “John Smith” in a Human Resources database at XYZ corporation is to be updated to equal “838-1180”, and his department is to be changed to department “6”. The original entry with five fields may appear as shown in Table 1.
TABLE 1Example Original Entryfull_name=“John Smith”PhoneNumber=“838-1178”UserID=“jsmith”Division=92Department=5email=“jsmith@xyzcorp.com”where the entry is of the format:full_name,PhoneNumber,UserID,Division,Department,email
A user-written script may poll the directory containing the changed entry, which generates a record in the change log. Records in the change log reflect the change to the entry as the series of LDAP modify operations shown in Table 2.
TABLE 2Example LDAP Change LogDN:cn=John Smith, ou=Austin,o=xyzchangetype:modifyreplace:PhoneNumberPhoneNumber:838-1180-changetype:modifyreplace:UserIdUserID:jsmith-changetype:modifyreplace:DivisionDivision:92-changetype:modifyreplace:DepartmentDepartment:6-changetype:modifyreplace:emailemail:jsmith@xyzcorp.com
The typical user-developed scripts do not attempt to identify the actual fields of data which were updated or modified. The resulting update to the other directories in the metadirectory are simply made in their entirety to every data object, including fields which were not actually modified.
If the polling operation is relatively fast compared to a series of single-field modification operations, the user-developed solution will detect each individual change, and update each entry (all fields at once) multiple times throughout the metadirectory.
This often results in many redundant entry updates throughout the metadirectory just to achieve small, incremental changes in the actual data. When realistic organizations of data sources are considered which may comprise hundreds of data sources each with several thousand entries, the system performance impact of these redundant updates is readily apparent.
Besides being an inefficient use of computing resources, this can cause considerable problems in overall system operation, as these updates are propagated over computer networks and consequently consume communications bandwidth and intermediate storage memory unnecessarily.
Another drawback of the LDAP approach is that legacy directories may be in existence indefinitely, and not all legacy directories may be upgraded to LDAP compatibility. Further, LDAP actually only provides a common access protocol (e.g. remote method of accessing the directory), but does not provide in itself actual heterogeneous data source management functions.
While some LDAP replication standards are in works but are as yet unfinished, many existing proprietary approaches are different and incompatible. In any case, replication and synchronization, whether proprietary or standards-based, are insufficient for meeting the needs of enterprise-wide heterogeneous data source directory management.
The term “metadirectory” refers to a class of enterprise directory management tools which provide means to manage and synchronize two or more directories containing heterogeneous data sources. In order to manage disparate heterogeneous data sources, a typical metadirectory product may require the individual data sources (e.g. directories, files, databases, etc.) to export their data to a common format, and then exchange that data with the metadirectory using file transfer, electronic mail, or other data transfer protocol. After the metadirectory receives the files from the data sources, an administrator can add or modify the data from the metadirectory. One such product is the VIA product, originally provided by the Zoomit Corporation, which was acquired by Microsoft Corporation.
Metadirectories are extremely useful for system administration and security management, as they can be used as an integration point to simplify existing solutions and to create new web-based applications. For example, every application has its own proprietary method or scheme of storing information associated with that application, whether it be user information, security information, configuration settings, etc.
Through use of metadirectories, these various data stores may be stored once and integrated so that they may be managed and administered as a single entity (according to the rules and constraints of the metadirectory), thereby reducing the total cost of maintaining this information while increase the security and reliability with which it is handled.
Because current metadirectory products, however, may require the various data sources to be able to export their data into these “common” formats, data sources which do not support such export operations may be excluded from inclusion in a metadirectory.
Another problem in managing entries from multiple heterogeneous data sources according to the present processes is that the information may not have been entered consistently in these data sources, e.g. there may be logical synonyms within the entries that are not exact character-string matches. For example, in three data sources managed within a single metadirectory, each data source containing information for “Robert Smith”, the name “Robert Smith” may have been entered as follows:                Robert Smith in data source 1        Bob Smith in data source 2        Rob Smith in data source 3.        
Typical metadirectory products provide very little in the way of automatically resolving or detecting these alias or related entries, and often require the administrator to manually intervene to manage these data objects. Traditional approaches to propagating changes to such records containing synonyms would be to propagate a change for each record variant. In this example, to effectively update the mailing address for “Robert Smith” throughout the metadirectory, an administrator would first have to be aware of the three available synonyms, and then manually execute three separate changes which would be propagated throughout the metadirectory.
Therefore, there is a need in the art for a system and method which minimizes the system performance impact of propagating updates to entries in metadirectories. Further, there is a need in the art for this system to cooperate with and extend the capabilities of existing metadirectory tools and technologies, providing user or administrator configurability and control.