The present invention relates to computerized data storage and retrieval systems and in particular to a system for linking separate data files according to user-definable relationships.
Typically one of the most difficult aspects of large engineering or similar projects is record keeping. For instance in the design and construction of a nuclear plant, massive numbers of documents are generated, including preliminary studies, drawings, specifications, letters, reports and the like. These documents must be stored in a logical fashion so that they can be retrieved when needed. When the number of such documents becomes very large it is often difficult to find them once they are stored, particularly when only the nature of a document to be retrieved, and not a name or a reference number under which it is filed, is known:
In addition to problems associated with storing and retrieving documents, there are also considerations associated with the "ripple" effect that changing one project document may have on other project documents. For instance if a design drawing is changed, other drawings of a specification which relate to the drawing might also have to be altered. For a very complete project it is not easy to determine the other documents affected. Also it is often important to keep records of document changes, including not only prior versions of a document but in addition records as to why a document was changed and who changed it.
The use of computerized data base systems is well known. Data base systems permit documents to be characterized according to various attributes of the document such as "author", "document type", "subject matter", etc. For instance, to characterize a specification for a pump written by Smith, a user may assign character strings "Smith", "specification", and "pump" as the values of author, document type and subject matter attributes. Such data base systems typically include search routines for locating documents identified by assigned attribute values matching a list of selected attribute values provided by a user, thereby enabling a user to easily locate all documents sharing a common set of attributes such as all pump specifications written by Smith.
More recently, the advent of rapid access bulk data storage devices and multiple computer networks has permitted computer systems to actually create and electronically store the documents as files in a bulk storage device in addition to keeping track of documents. To be effective for use in storing and retrieving documents associated with a large project, such file management systems should be capable not only of locating groups of files containing documents having common attributes, but also of finding groups of files which are related to a given file in some definable way. For instance, once a user has located a particular file containing Smith's pump specification, he may then wish to locate other files which contain reviewer's comments regarding the pump specification or which may contain pump drawings related to the pump specification. There is continuing activity in the area of computerized data storage and retrieval systems pertaining to "hypertext" systems which enable users to establish "links" between file pairs indicating that two files are related in some way. (The article "Reading and Writing the Electronic Book", by Nicole Yankelovich and Norman Meyrowitz, pages 15-30 in the October, 1985 issue of Computer, published by the Institute of Electrical and Electronic Engineers, is a good summary of prior art relating to systems of this type and is incorporated herein by reference.) A "link" can be visualized as a pointer from a first file to a second file indicating the second file is related to the first file. A link is implemented as a stored record containing data identifying the linked first and second files and containing link attribute data defining the nature of the relationship between the two files. For instance, when the first file is a specification and the second file is a comment regarding the specification, a link record may be created which identifies the first and second files and which contains link attribute data indicating that the relationship between the files is one of "comment". A separate link record is provided for every pair of linked files. If three comments have been written about a particular specification and stored in separate files, three links records may be created, each indicating a "comment" relationship between the specification file and a corresponding one of the comment files. Link records may be grouped according to the files they link so that once a particular file is identified, such as the specification file, all other files, such as the comment files, to which the particular file is linked, can be quickly determined by reviewing only the link records associated with the particular file.
Files and links between files both may have assigned attributes characterizing the nature of the file or link, but the concept of a file attribute, as known in the art, differs somewhat from the concept of a link attribute. Although separate files may be related by file attributes, the relationship between such files is one of commonality and is non-directed in that the relationship does not involve a pointing from one file to another file. For instance, all files created by Smith are related by virtue of having a common author and this common feature of each such file may be denoted by using "Smith" as the value of an "author" file attribute for each file. In contrast, links describe relationships between pairs of files in a directed fashion, in the sense that a link leads a user "from" a first document "to" another document for a particular reason, the link attribute being descriptive of that reason. Thus the relationship indicated by a link attribute is not one of commonality but is rather one of "connectivity". For instance, the relationship between a first file containing a drawing and a second file containing a comment about the drawing cannot be easily described in terms of what both files have in common (i.e., by a file attribute) since the files differ; one file is a "drawing" and the other file is a "comment". But if the concept of "comment" is used to describe a link between the two files, rather than to describe the nature of one of the files, the relationship between the files is clearly specified.
Even though a file may be thought of as containing a "comment", and therefore may be assigned a file attribute value "comment", it is not particularly useful to do so since users are typically not interested in finding the group of all files which contain comments. Instead, users are usually more interested in finding files containing comments about a particular user-identified file. It is therefore much more useful to establish a link between two files where the link has a "comment" attribute.
Links give a collection of files structure by connecting file pairs to form a "web" or a "graph" wherein each file may be thought of as a "node" interconnected to other nodes by links. Some systems of the prior art are adapted to display a representation of the graph enabling a user to visualize how sets of files are interrelated in much the same way that a map depicts how towns are interconnected by roads or rivers. Thus, for instance, when a user decides to change one file ("node"), he may quickly determine all of the other files which might be affected by the change by inspecting other nodes which are linked to the node to be changed.
However, if the number of files associated with a project is large, the graphs become complex, difficult to display and difficult for a user to utilize. Therefore systems typically enable a user to reduce the size of a graph to be displayed by specifying to the system the attributes of various files of interest. The system then displays a "subgraph" which contains only nodes characterized by the special attributes. For instance when a user is only interested in files relating to pumps, the system displays the nodes representing pump-related files along with their interconnecting links, thereby reducing the number of files the user might have to inspect in order to find a particular file of interest.
While prior art systems help a user to organize and retrieve stored data, these systems still leave certain record keeping problems unresolved. One problem relates to the difficulty of preselecting the types of file or link attributes which may be most advantageous. In order for a system to be useful, the attributes and their values which a user can use to describe files and links must provide an appropriate basis for searches to be performed by the system. However, only a limited number of attributes is usually contemplated.
Hypertext systems also would be more useful if they included provisions for maintaining comprehensive records of how project documentation changes with time. Some computerized data storage systems store old versions of a document but for large projects documents often undergo so many revisions that it becomes impractical or impossible to store every version of every document. It would also be desirable to maintain a history of changes to file and link attributes. For instance a file attribute may indicate the name of a person responsible for approving changes to that document, and when another person assumes that responsibility the corresponding attribute value must be changed. But in doing so the identity of the person previously responsible for approving changes is lost unless some means can be provided for maintaining the information. An ideal system would be able to recreate the entire graph of a system as it existed at any previous time, including the contents of files at the time, the file attributes assigned to the files, the links between the files existing at the time and the link attributes existing at the time. This feature would be very useful in determining the cause of problems that arise in the course of a project but implementation thereof is generally impractical in prior art systems.
Another problem associated with the use of multi-user systems occurs when two people independently attempt to change the same file at the same time. A conflict arises as to which new version should be considered the latest version. Some systems prevent the conflict by blocking access to a file by more than one person at a time, but this approach inefficiently utilizes multiple user capability, particularly when one user wants only to read the file rather than change it.
Finally, it would be desirable to provide a data storage and retrieval system capable of notifying a user when another user attempts to access or change a file, a link, or other features of a graph. Such capability would facilitate informing interested parties when a file or other aspect of a graph has been, or is about to be, changed.