1. Field of the Invention
The present invention relates to a method and system for recording and managing relationships between elements of collections which may include previously stored elements. The invention further relates to a method and system for managing document descriptors and relationships of documents relative to one another.
2. Description of the Related Art
Information in many organizations is held in digital form in repositories which are not part of the same data library, the same computing systems or even the same administrative domain. This has hampered access to the information held in those separate repositories, even though the information held separately may be related. For example, an organization may have information residing in completely different data processing systems. These different data processing systems may be in place as a result of combining previous projects, or because of mergers or acquisitions of companies having different data processing systems. It is a common occurrence that valuable data resides and is used in separate and distinct libraries, computing systems or administrative domains.
A problem many such organizations face is that information held in heterogeneous data stores as discussed above, which may, in the minds of people within the organization, be related, remains unrelated at a data processing level. Hence, that information can be difficult to handle, and the full value of it unrealized since a frequent paradigm in intellectual work is to identify new relationships among prior ideas and recorded facts for exploitation towards practical or academic ends. Collected into carefully managed records which include records of the newly related ideas, such information is at the core of what it means to be a library. If the collection is held in digital format, it is known as a digital library. Until now, to create a digital library the source information has had to have been moved or copied from wherever it occurred into a specialized repository. If the information is moved, prior paths for finding the information and applying tools to manipulate it are disrupted; if the information is copied, referential integrity between the database and external copies must be maintained by users more or less manually. Either alternative imposes an annoying and sometimes tricky administrative burden.
A problem with conventional access control methods for file systems in such environments is that they fail to provide sufficient flexibility or scalability for these types of complex enterprises. For example, service organizations providing aircraft maintenance to both military and commercial customers need to administer authorization policies which differ for different customers, which must persist over decades, and which may evolve over the years. Such requirements cannot be satisfied with subsystems like IBM's program product RACF or with any of the access control mechanisms that have evolved for UNIX-like file systems.
Further, today's file systems fail to provide coordinated control for files in administratively separate file systems (such as might exist in different divisions of a company, especially after one company has acquired another), or heterogeneous file systems such as occur in projects using both mainframe and generic workstation-type (e.g., UNIX) systems.
Today, to combine information distributed in a network to model abstractions in addition to those planned when the information was created typically requires that the information representing each individual object be collected into a single digital storage subsystem if prudent integrity and security are to be enforced. This step always adds cost, frequently is inconvenient, and inevitably adds administrative complexity either for synchronizing information in otherwise independent data stores or for application program updates, or both.
A means for enabling a database to manage data stored in external operating system files, as if the data were directly stored in the database, has been proposed in U.S. patent application Ser. No. 08/449,600 filed May 24, 1995, entitled "Method and Means for Linking a Database System With a System for Filing Data" to Cabrera et al. (hereinafter referred to as "Cabrera et al."), which is incorporated by reference herein. In Cabrera et al. a file in a file system is bound to a database tuple in database. The database system acts as a centraized index for searching across enterprise-wide data that includes both enterprise data and extracted features of non-coded data, and large objects that can be distributed among several file servers. In order to accomplish this, Cabrera et al. employs a table 60, shown in FIGS. 1 and 2, in the database 16 which relates a file 19 in the file system 17 to attributes about the file held in the database table 60. One column 63 in the Cabrera et al. database table is a column for an "external file reference" (efr) data type. The efr data type contains information identifying a server (e.g., server i) that controls the file (e.g., 70 and 72). The efr data type binds the database tuple 61 to the file via the server name and file name 70. FIG. 1 shows a Cabrera et al. table in the database in which the table has an efr data type. Here, two file names 70 and 74, in file servers i and j, respectively, and their corresponding files are bound to tuples in the table. FIG. 2 shows a structure in which the table 60 and its associated database management system 15, are coupled to various file servers 17.
By providing a level of indirection, via the database, to a file in the file system, Cabrera et al. can safely bind information in file systems into object instances conforming to formal object models, and can provide referential integrity for those files. However, Cabrera et al. is limited to safely binding only files in the file systems. Organizations having heterogeneous data stores need more than just referential integrity for collections of files. They also need to provide flexible yet rigorous access control for the data elements, or targets, in the heterogeneous data stores. Furthermore, they need to support data models conducive to an enterprise system, such as information which is represented in a directed acyclical graph (dag) form. A document model, for example, is a natural model for information held in heterogeneous stores. Documents can be readily handled by a library service subsystem executable on one or more electronic digital computers, as disclosed in U.S. Pat. No. 5,649,185 to Antognini et al., which is incorporated herein by reference.
A generalized access control method for large collections is disclosed in H. M. Gladney, Access Control For Large Collections, ACM Transactions on Information Systems (April 1997), hereinafter "Access Control for Large Collections," which is incorporated herein by reference. The Access Control for Large Collections reference describes a robust document access control method within the limits of a single library subsystem. This access control method improves over known access control methods, as discussed in section 4.6 of Access Control for Large Collections. However, more is needed than this conventional access control method to support access control for information stored in the heterogenous enterprise-wide data storage environment described above, which typically span more than one library system.