Processing and storage of electronic data is now essential to the daily operation of most organizations. With the advent of networking technology, organizations that utilize electronic data processing are becoming increasingly reliant upon “enterprise” computer networks in which processing and storage are distributed over a number of heterogeneous interconnected computers. In many enterprise systems, a member of the organization will have access to multiple resources across the system. For example, an employee of a corporation may use an email account, a Windows NT account, and a Unix account to access and process data stored on the enterprise system. Additionally, organizations will often wish to provide external users, such as distributors, business partners and suppliers, with accounts granting limited access to the data stored on the enterprise system. The administrative overhead required to manage the internal and external accounts often becomes more difficult to manage than the data that is actually of interest to the organization. This can lead to decreases in system efficiency and to high support costs.
Consequently, organizations are becoming increasingly interested in efficient systems management as it can provide, among other benefits, reduced information technology (“IT”) costs and increased efficiency in setting up and managing enterprise data. Currently, however, providing efficient systems management for enterprise computer networks, particularly those that contain legacy data, is a quixotic task. This is partly because many organizations, over time, have developed networks including a variety of heterogeneous computer systems storing myriad different data types. Further adding to the complexity of managing enterprise networks, organizations often store inconsistent data across the network. As just one example of data inconsistencies, a company may store one home phone number for an employee at a corporate human resources (“HR”) mainframe while storing a different home phone number at a departmental mainframe. Because the two mainframes may be heterogeneous (e.g., employ different hardware, operating systems, protocols, tools and/or applications), synchronizing between the two resources to eliminate inconsistencies can prove difficult.
Most prior art systems management techniques address these difficulties by centralizing data. Profile-based management systems, directory-based management systems, and meta-directories offer various approaches to centralizing data storage. FIG. 1 illustrates the limitations of prior art systems that rely on centralization of data. FIG. 1 is a diagrammatic representation of computer system 100 comprising an administrative system 110, including a centralized database 112, and resources including an email server 120 (such as a Microsoft Exchange server), a Unix system 125, a Windows NT system 130 and a mainframe 135. The resources are interconnected to each other and are connected to administrative system 110 via a network 145. Each resource can contain a collection of data items that represent entities or individuals. For example, e-mail server 120 can contain a collection of email accounts 150, Unix system 125 can contain a collection of Unix accounts 155, Widows NT system 130 can contain a collection of Windows NT accounts 160 and mainframe 135 can contain a collection of data records 165.
These collections of data represent each resource's “view” of an individual or entity. In the case of an employee Jane Doe, for example, email server 120 may refer to her as janed (i.e., her e-mail user name), Unix system 125 may refer to her as JaneD (i.e., her Unix account user name) or by her Unix identification (“UID”), and Windows NT system 130 may refer to Jane Doe as JANED (i.e., her windows NT account user name). In addition to account information allowing Jane Doe to access data on computer system 100, information such as Jane Doe's department code, time keeper number, salary rate, and employee identification can be stored at mainframe 135. This may be information that is not personally used by Jane Doe, but it is used, instead, by her managers or other personnel. Thus, mainframe 135 would also maintain an identity for Jane Doe, based on her employee record, which could, for example, be stored under JANE_D.
To illustrate the shortcomings of prior art systems that rely on centralization of data, assume that employee Jane Doe marries and changes her last name to Smith. One method of updating Jane Doe's name on system 100 would be to separately enter the updated information at each system. For an organization having a large number of users and/or a highly distributed computer system 100, this can be impractical. To ameliorate the inefficiencies of separately entering information at each resource, one prior art system replicates all the information identifying individuals or entities in a centralized database 112 (represented by replicated data 175). Thus the collection of email accounts 150, the collection of Unix accounts 155, the collection of Windows NT accounts 160 and the collection of data records 165 are typically replicated at centralized database 112. When a change is made to the data, the change can be entered to replicated data 175 and can then be pushed out to each of the resources. In the case of Jane Doe, then, replicated data 175 is modified to account for her name change, and the replicated data can then be pushed out to one or more resources. In the case of an individual such as Jane Doe, the replicated data, thus, contains a “master copy” of her data.
While a system having a centralized database helps ensure data consistency for data entered through administrative system 110 and pushed out to each resource, it has several shortcomings. One such limitation involves the resolution of inconsistencies between data changed at the individual resources. Continuing with the example of newlywed Jane Doe; if Jane Doe changes her last name to Smith, her name may be inadvertently changed to Smyth at mainframe 135, while her name is changed to Smith at email server 120. When data from the resources is copied to centralized database 112, there will be three names for the same employee on computer system 100: Jane Doe, Jane Smyth and Jane Smith. Administrative system 110 must determine if a name change is actually appropriate and which of the changes is appropriate. Once the specific change is selected, the change is distributed to the resources, overwriting local changes (or lack of changes) made at each resource. Thus, for example, if Jane Smyth was arbitrarily selected as the correct change, Jane Smyth would be distributed to each of the resources, overwriting the correct name, Jane Smith.
Furthermore, if different resources are controlled by different groups within the organization, the decision to favor one resource over another can lead to political tension within the organization. As an additional limitation of this prior art system, in a large enough computer system 100, some subset of the resources will be unavailable at any given time due to connectivity issues or other technical problems. Therefore, only some of the resources will be updated, causing additional inconsistencies in Jane Doe's data.
In order to implement a centralized database approach, prior art systems typically rely on an overall systems administrator or manager to identify resources used by members of the organization. This administrator's view of which resources members of the organization require, however, is typically limited and the administrator generally has little idea of which resources are used on a department-by-department basis or of the resources used on an individual level. Furthermore, prior art systems typically do not automate the process of retrieving, mapping, correlating and merging information from multiple heterogeneous sources. Because of this, administrators must typically write ad hoc queries and scripts to extract, map, correlate, merge and upload information into the centralized database 112, making implementation of the centralized database tedious.
Centralization of data typically requires replicating at least some subset of the data being managed. This type of system scales poorly because of the large amount of data that must be stored at the centralized database 112 and it further introduces problems with synchronizing the centralized database 112 with the resources. Furthermore, because these systems require data to be copied repeatedly back and forth from the resources to the centralized database 112, significant bandwidth demands are inflicted upon the network. As yet another shortcoming, manually locating and organization data from a number of resources typically requires significant investments of time and money. Thus, prior art systems are generally expensive and inefficient.