The present invention relates, in general, to related data records, and, more particularly, to a method and system that maintains referential integrity for related data records, whether the date is distributed or not.
Computer systems including business systems, entertainment systems, and personal communication systems are increasingly implemented as distributed software systems. These systems are alternatively referred to as xe2x80x9centerprise networksxe2x80x9d and xe2x80x9centerprise computing systemsxe2x80x9d. These systems include application code and data that are distributed among a variety of data structures, data processor systems, storage devices and physical locations. They are intended to serve a geographically diverse and mobile set of users. This environment is complicated because system users move about the distributed system, using different software applications to access and process data, different hardware to perform their work, and often different physical locations to work from. These trends create a difficult problem in providing a secure yet consistent environment for the users.
In general, distributed computing systems must scale well. This means that the system architecture desirably adapts to more users, more applications, more data, and more geographical distribution of the users, applications, and data. The cost in money and time to switch over a network architecture that is adapted to a smaller business to one suited for a larger business is often prohibitive.
A conventional computing system uses a client/server model implemented on a local area network (LAN). In such systems powerful server computers (e.g., application servers and file servers) are used to process and access data. The requested data is then transmitted to the client computer for further processing. To scale to larger networks, multiple LANs may be internetworked using, for example, leased data lines to create a wide area network (WAN). The equipment required to implement a WAN is expensive and difficult to administer. Also, as networks become larger to include multiple LANs and multiple servers on each LAN it becomes increasingly difficult to find resources (i.e., files, applications, and users) on any one of the LANs.
As computing power continues to become less expensive, clients tend to process and store their own data, using the server primarily as a file server for sharing data with other client computers. Each software application running on the client, or the client""s operating system (OS) may save client-specific configuration data that is used by the client to fine-tune and define the user""s software environment at runtime.
As used herein, the term xe2x80x9cprofile informationxe2x80x9d refers to any information or meta-data used by a particular piece of hardware, software application, or operating system to configure, initialize, shut-down, aid in making run-time decisions, or the like for a computer. The profile information may be associated with a particular application or group of applications, a particular hardware device or group of devices, as well as a particular user or group of users. Some operating systems store user profile information that is used during boot operations at application start-up, to tailor a limited number of the system characteristics to a particular machine user. However, this profile information is closely tied to a single machine and operating system. As a result, the profile information is not useful to a new user the first time that user logs onto a particular machine. Moreover, this information is not available to remote users that are accessing the LAN/WAN using remote access mechanisms.
Existing mechanisms tend to focus on a single type of profile informationxe2x80x94user information or application information or hardware information. Also, because these mechanisms are very application specific they limit the number and type of attributes that can be retained. Further, the profile information is isolated and fails to indicate any hierarchical or relational order to the attributes. For example, it may be desirable that a user group is required to store all files created using a particular application suite to a specific file server. Existing systems, if such a service is available at all, must duplicate profile information in each application program merely to implement the required file storage location preference. Storage location direction based on a user-by-user or user group basis is difficult to implement and may in fact require a shell application running on top of the application suite. Even then, the system is not extensible to access, retrieve, and use profile information for a new user that has not used a particular machine before.
As in the example above, existing systems for storing configuration information lead to duplicative information stored in many locations. Each application stores a copy of its own configuration information, as does each hardware device and each user. Much of this information is identical. It is difficult to maintain consistency among these many copies in distributed data environments. For example, when the specified file storage location changes, each copy of the configuration information must be changed. The user or system administrator must manually track the location and content of each configuration file. An example of the inefficiencies of these types of systems is found in the Windows 95 registry file that holds profile information but has an acknowledged tendency to bloat over time with duplicative and unused data. Moreover, the registry file in such systems is so closely tied to a particular machine and instance of an operating system that it cannot be remotely accessed and used to configure other computers or devices. Hence, these systems are not generally extensible to manage multiple types of profile information using a single mechanism. A need exists for profile information that is readily accessible to all machines coupled to a network and to machines accessing the network through remote access mechanisms.
Peer-to-peer type networks are an evolutionary change to client/server systems. In a peer-to-peer network each computer on the LAN/WAN can act as a server for applications or data stored on that machine. A peer-to-peer network does not require, but is able to, run alongside a client/server system. Peer-to-peer architectures offer a potential of reduced complexity by eliminating the server and efficient use of resources available in modern client and workstation class computers. Peer-to-peer networks, however, remain dependent on a secure, closed network connection to implement the LAN/WAN. Such networks are difficult to scale upwardly.
Peer-to-peer solutions also do not scale well because, as the network becomes larger, it becomes increasingly difficult to identify which peer contains the applications and data needed by another peer. Moreover, security becomes more difficult to manage because the tasks of authorizing and authenticating users is distributed among the peer group rather than in a centralized entity. A need exists for a system and method that enables a peer-to-peer architecture to scale without reduced performance, ease of use, and security.
Another complicating influence is that networks are becoming increasingly heterogeneous on many fronts. Network users, software, hardware, and geographic boundaries are continuously changing and becoming more varied. For example, a single computer may have multiple users, each of which work more efficiently if the computer is configured to meet their needs. Conversely, a single user may access a network using multiple devices such as a workstation, a mobile computer, a handheld computer, or a data appliance such as a cellular phone or the like. A user may, for example, use a full featured e-mail application to access e-mail while working from a workstation but prefer a more compact application to access the same data when using a handheld computer or cellular phone. In each case, the network desirably adapts to the changed conditions with minimal user intervention.
In order to support mobile users, the client/server or peer-to-peer network has to provide a gateway for remote access. Typically this has been provided by a remote access server coupled to a modem. Remote users would dial up the modem, comply with authorization/authentication procedures enforced by the server, then gain access to the network. In operation, the mobile user""s machine becomes like a xe2x80x9cdumb terminalxe2x80x9d that displays information provided to it over the dial-up connection, but does not itself process data. For example, a word processing program is actually executing on the remote access server, and the remote user""s machine merely displays a copy of the graphical user interface to the remote user. The remote user is forced to use the configuration settings and computing environment implemented by the remote access server. A need exists for a method and system for remote access that enables the remote user to process data on the remote machine without being confined to using configuration settings imposed by a remote access server.
There is increasing interest in remote access systems that enable a user to access a LAN/WAN using a public, generally insecure, communication channels such as the Internet. Further, there is interest in enabling LANs to be internetworked using public communication channels. This is desirable because the network administrator can provide a single high speed gateway to the Internet rather than a remote server/modem combination for each user and expensive WAN communication lines. The Internet gateway can use leased lines to access the Internet rather than more costly business phone lines. Also, the Internet gateway can be shared among a variety of applications and so the cost is not dedicated solely to providing remote access or wide area networking. The reduction in hardware cost and recurrent phone line charges would be significant if remote users could access the LAN/WAN in this manner.
In an enterprise system it is critical that distributed resources remain available. Access to profile information is often prefatory to using a particular system or software application for meaningful work. High availability is accomplished in most instances by replicating critical resources and managing the replicas so that they remain consistent. Replication leads to difficulties in keeping the replicas consistent with each other. This is particularly true for profile type information that may be controlled by or owned by a variety of entities/systems. For example, a user may own profile information related to that user""s preferences, passwords, and the like. However, a workgroup administrator may own profile information related to group membership, group security policies, and the like. Further still, individual applications may own profile information describing that application""s configuration operations. In an environment where any entity can change the information contained in any profile that it owns at any time, it quickly becomes an intractable problem to maintain consistency among multiple replicas. A need exists for a system and methods for maintaining profile information owned by a diverse set of entities in a highly available manner.
From a network user""s perspective, these limitations boil down to a need to manually configure a given computer to provide the user""s desired computing environment. From a remote user""s perspective, these limitations require the user to manually reconfigure the remote access computer to mimic the desired computing environment or tolerate the generic environment provided by default by the remote access server. From a network administrator""s perspective, these complications require software and operating systems to be custom configured upon installation to provide the desired computing environment. In each case, the time and effort consumed simply to get xe2x80x9cup and runningxe2x80x9d is a significant impediment to efficient use of the distributed computing environment. What is needed is a system that readily adapts to the changing, heterogeneous needs of a distributed network computing environment.
One solution to the problem of finding resources in a distributed system is to use directories. Directories are data structures that hold information such as mail address book information, printer locations, public key infrastructure (PKI) information, and the like. Because of the range of functions and different needs of driving applications, most organizations end up with many different, disparate directories. These directories do not interact with each other and so contain duplicative information and are difficult to consistently maintain.
Meta-directories are a solution that provides directory integration to unify and centrally manage disparate directories within an enterprise. A meta-directory product is intended to provide seamless integration of the multiple disparate directories. However, existing solutions fall short of this seamless integration because the problems to be solved in directory integration are complex. Existing meta-directory solutions tend to require significant up front configuration effort to account for these complexities. Also, a meta-directory product must be aware of the data format for each of the data structures that it is supposed to integrate. This required knowledge makes meta-directories difficult to maintain in a computing environment that is rapidly changing. As a result, meta-directory solutions are not sufficiently extensible to account for the wide variety of resources available on a distributed network. In the past, meta-directory technology has not been used to catalog meta-data of a sufficiently general nature to meet the needs of a dynamically growing and changing distributed computing environment.
X.500 is one current model for managing on-line directories of users and resources (Directory Services) that includes the overall namespace as well as the protocol for querying and updating it. An X.500 directory is called a Directory Information Base (xe2x80x9cDIBxe2x80x9d) and the program that maintains the DIBs is called a Directory Server Agent (xe2x80x9cDSAxe2x80x9d). A Directory Client Agent (xe2x80x9cDCAxe2x80x9d) is used to search DSA sites for names and addresses.
The protocol generally used in conjunction with X.500 is the xe2x80x9cDAPxe2x80x9d (Directory Access Protocol) and it operates over the OSI (Open System Interconnection) network protocol stack. Due to the fact that a full DAP client is difficult to implement on smaller computer systems, the LDAP, (Lightweight Directory Access Protocol) was developed.
Like X.500, LDAP is both an information model and a protocol for querying and manipulating the information model. The overall data and namespace model is essentially that of X.500. A fundamental difference between DAP and LDAP is that the latter protocol is designed to run directly over the TCP/IP (Transmission Control Protocol/Internet Protocol) stack, and it lacks some of the DAP protocol functions such as security. In operation, LDAP enables a user to locate organizations, individuals, and other resources such as files and devices in a network, whether on the Internet or on a corporate intranet.
In a network, a directory is used to indicate where in the network something is located. On TCP/IP networks (including the Internet), the Domain Name System (xe2x80x9cDNSxe2x80x9d) is the directory system used to relate the domain name to a specific network address or unique location on the network. If the domain name is not known, LDAP allows a user to initiate a search for, for example, an individual without knowing exactly where he is located. Simply stated, an LDAP directory is organized in a simple xe2x80x9ctreexe2x80x9d hierarchy and may consist, for example, of the following levels:
The xe2x80x9cRootxe2x80x9d directory (the starting place or the source of the tree), which branches out to
Countries, each of which branches out to
Organizations, which branch out to
Organizational units (divisions, departments, and so forth), which branches out to (includes an entry for)
Individuals (which includes people, files, and shared resources such as printers)
An LDAP directory can be distributed among many servers, and each server can have a replicated version of the total directory that is synchronized periodically. When an LDAP server receives a request from a user, it takes responsibility for the request, passing it to other DSAs as necessary, but nevertheless ensuring a single coordinated response for the user.
The Internet Engineering Task Force (IETF) is a large open international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet. IETF publishes specifications for various internet protocols including LDAP. The current LDAP protocol is specified in RFCs (Request For Comments) 1777 and 1778 while the string representation of LDAP search filters is specified in RFC 2254. The disclosures of RFC 1777, RFC 1778 and RFC 2254 are specifically incorporated herein by this reference.
Finally, as to related data elements, whenever a data element in a data architecture will be accessed by different users or entities, and particularly in distributed data systems, it is desirable to minimize data synchronization issues. If multiple copies are maintained of the same data record, then there will always be issues relating to the synchronization of the various copies of the data record.
It is against this background, and the desire to solve the problems of the prior art, that the present invention has been developed.
Briefly stated, the present invention relates to a method of providing referential integrity in a data architecture, the data architecture including a plurality of data elements, wherein certain of the data elements are linked to other of the data elements. The method includes providing the ability for a first data element to depend from and be linked to a second data element and providing the ability for the first data element to also depend from and be linked to a third data element. The first data element is only stored in one location, the storage being associated with the second data element, with only a referential link between the first data element and the third data element.
The location of the first data element may be associated with the second data element through a ResourceID. The method may further include providing the ability for a fourth data element to depend from and be linked to the first data element. The fourth data element may depend from and be linked to any one of, or any combination of, the first, second, and third data elements. The second or third data element may depend from and be linked to the fourth data element. The second or third data elements can depend from and be linked to other data elements. The method may further include, when a given data element is to be deleted, checking to see if the given data element has other data elements depending from the given data element. The method may further include, if the given data element has other data elements depending therefrom, deleting the dependent data elements if the dependent data elements are not dependent on other data elements.
The method may further include, if the given data element has other data elements depending therefrom and the dependent data elements are dependent on other data elements, determining if the dependent nature of the dependent data element to the given data elements is a true bind or is a link reference. The method may further include, if the dependent nature of the dependent data element to the given data element is a true bind, changing the storage of the dependent data element to a storage associated with one of the other data elements to which the dependent data element is dependent. The method may further include determining if the dependent data element is also dependent on data elements in addition to the one to which the dependent data element now has associated storage. The method may further include, if the dependent data element is also dependent on data elements in addition to the one to which the dependent data element now has associated storage, updating linking information related thereto.
The present invention also relates to a data architecture that automatically provides referential integrity. The data architecture includes a plurality of data elements including a first data element, a second data element, and a third data element, wherein certain of the data elements are linked to other of the data elements. The first data element depends from and is linked to a second data element. The first data element depends from and is linked to a third data element. The first data element is only stored in one location, the storage being associated with the second data element, with only a referential link between the first data element and the third data element.
The data architecture may further include a fourth data element depending from and linked to the first data element. The fourth data element may depend from and be linked to any one of, or any combination of, the first, second, and third data elements. The second or third data element may depend from and be linked to the fourth data element. The data architecture may further include, when a given data element is to be deleted, checking to see if the given data element has other data elements depending from the given data element, and, if the given data element has other data elements depending therefrom, deleting the dependent data elements if the dependent data elements are not dependent on other data elements.
The data architecture may further include, if the given data element has other data elements depending therefrom and the dependent data elements are dependent on other data elements, determining if the dependent nature of the dependent data element to the given data elements is a true bind or is a link reference, and, if the dependent nature of the dependent data element to the given data element is a true bind, changing the storage of the dependent data element to a storage associated with one of the other data elements to which the dependent data element is dependent. The data architecture may further include determining if the dependent data element is also dependent on data elements in addition to the one to which the dependent data element now has associated storage, and, if the dependent data element is also dependent on data elements in addition to the one to which the dependent data element now has associated storage, updating linking information related thereto.
The present invention also relates to a computer program product embodied on a propagating signal. The computer program product includes computer program devices readable by a data processor coupled to receive the propagating signal for providing referential integrity in a data architecture, the data architecture including a plurality of data elements, wherein certain of the data elements are linked to other of the data elements. The computer program devices include first program code devices configured to cause the data processor to provide the ability for a first data element to depend from and be linked to a second data element, second program code devices configured to cause the data processor to provide the ability for the first data element to also depend from and be linked to a third data element, and third program code devices configured to cause the data processor to allow the first data element to only be stored in one location, the storage being associated with the second data element, with only a referential link between the first data element and the third data element.