1. Field of the Invention
This invention relates in general to computer database systems, and in particular to a data management method for representing hierarchical functional dependencies.
2. Description of Related Art
A significant problem for computer database systems is the efficient management of large volumes of data which generally reside on hard disks or similar storage media. One solution for this problem is the hierarchical, or indexed-sequential, method of data management. Hierarchical databases typically rely on the B-tree algorithm for the actual physical organization of data within hard disk storage. This algorithm is extremely efficient both for sequential access of all data, and for indexed access of an individual data value. The B-tree algorithm organizes disk storage as a complicated hierarchy of pointers, indices and data. As data are stored and deleted, the algorithm may rearrange these internal structures. Thomas H. Cormen et. al. discuss the B-tree algorithm in Chapter 19 of Introduction to Algorithms (Cambridge, Mass.: The MIT Press, 1990).
The Bruffey U.S. Pat. No. 4,945,475 describes a hierarchical filing system organized as a B-tree structure.
The intricacies of the actual physical organization of data are generally hidden from the programmer, who references data using a programming language by means of logical specifications which are translated to the physical data address. MUMPS is an example of such a database system MUMPS is a registered trademark of the Massachusetts General Hospital. It is the third ANSI language after FORTRAN and COBOL. In MUMPS, data are stored in hierarchical data structures in both secondary and main memory as entries which consist of a sequence of indices, expressed as a function, addressing one data value. For example, Complaint(Sam,1/1/90)=pallor is a simplified MUMPS expression for storing data value, pallor, at the address indexed by Complaint(Sam,1/1/90). The " " signifies storage within a MUMPS database. Since data are stored hierarchically, deletion of the data value addressed by an index also deletes all other data values addressed by extensions of that index. For the above example, deleting the value of Complaint(Sam) also deletes the value of Complaint(Sam,1/1/90).
MUMPS has one data type: a character string. Numbers are also strings, and every string has a numerical value. The empty string is denoted by " " and has a numerical value of 0. MUMPS also has various string processing functions, including: $E, which extracts a character; $L, which returns the length of a character; $P which returns a delimited piece; and, `.sub.-- `, which concatenates two strings.
Although hierarchical databases and their auxiliary programming tools facilitate database construction, much effort and programming expertise is required for construction of individual database applications.
The Bachman U.S. Pat. No. 4,631,664 discusses programming problems related to the construction of database applications and describes various data models.
The Huber U.S. Pat. No. 4,791,561 also discusses the considerable effort required for the construction of database applications, as well as the desirability of having such applications constructed by users who themselves are not programmers.
Many difficulties of database construction are alleviated by relational databases, which organize data into tables. The simplicity of the relational model enables many applications to be constructed quickly without special expertise. For some applications, however, there exist relationships among the application data which can not be naturally represented by the relational model.
For example, FIG. 1 illustrates a relation representing patients seen at a clinic. The name, gender, date of visit, and reason for the visit are entered respectively as values of the attributes Person, Sex, Visit and Complaint. Each column contains values belonging to one attribute. The first column, for example, contains values of the attribute Person. Each row contains associated values. The first row states that Sam, a male, visited the clinic on 1/1/90 because of pallor. The second and third rows of the relation are interpreted in a similar fashion. The fourth row, however, is meaningless, because it states chest pain as the reason for Sue's visit which did not occur.
The database of the example could have become corrupted in one of two ways: either no value was entered for Sue's Visit, or else a value was entered, but was subsequently deleted. The general problem, illustrated by this example, is called the problem of functional dependencies. The term "functional" is appreciated when Complaint is regarded as a data function, Complaint(Visit), which depends upon its argument, Visit.
For database applications having dependencies among attributes, values should not be stored without others upon whom they depend, and values should not be deleted without their dependents. In order for dependencies to be maintained, they must be represented in some fashion. Since the relational data model per se can not represent dependencies, additional methods are required. One solution is to write programs which check for dependencies when data are stored or deleted. This solution requires programmers for database construction. The generally recommended solution is to design the database as a collection of smaller relations which are determined according to the methods of either the Boyce-Codd or the third normal form. Implementation of the recommended solution entails time and effort, and the resulting database is often impractical. Furthermore, the method is useful for initial database design, but inappropriate for modification of a database into which data has already been entered.
A related problem of the relational model is the difficulty of representing attributes which have multiple values. For example, in FIG. 1, Sam, on 1/1/90, complains both of pallor and of weakness. Since only one value may be entered into each cell of the relation, an entire row is needed to represent each of the multiple values. In the example, the first row represents pallor and the second row represents weakness. Except for the different values of Complaint, all other values of these two rows are identical, so that storage is wasted. The recommended solution for this problem is to replace the original relation with several smaller ones by the method of the fourth normal form. An example of such a solution is illustrated in FIG. 2, which shows two smaller relations which replace the relation of FIG. 1. Decomposing relations into smaller ones, however, mars the simplicity and ease of use of the relational model. The problems of functional and multivalued dependencies are discussed by Jeffrey D. Ullman in Chapter 7 of Principles of Database and Knowledge-Base Systems (Rockville, Md.: Computer Science Press, 1988).
An attribute with multiple values corresponds to a function whose value is a set. Such a function is multivalued, according to the terminology of Claude Berge, in Chapter 1 of The Theory of Graphs and its Applications (New York, N.Y.: John Wiley & Sons, Inc., 1964).
Hierarchies, also called trees, forests, and acyclic graphs, are ubiquitous in computer software, and many different methods for representing them within computers are known and used. The representation chosen for a particular program is largely determined by the program's purpose and constraints.
J. Buckwold et. al. in "A Database System for Capturing and Reporting Cardiac Catheterization Data" in Computers in Cardiology (in press: IEEE Computer Society, 1991) describe a natural language report application generated by DOC Version 2. DOC generally enforces hierarchical functional dependencies for applications which it generates. This is accomplished by a complex method: the sequence of data-entry forms forces entry of predecessors for dependent values; and, within storage, dependencies are maintained by a fixed address scheme with parameters that receive values according to their menu type. The method of DOC lacks a data model for representing dependencies, is not generally applicable, and introduces spurious dependencies for different values having the same menu type. However, within the context of usage for which these problems are avoided, the restriction to hierarchical dependencies does not appear to be a significant limitation. Thus, experience with DOC suggests that a solution for the special case of hierarchical functional dependencies may be adequate in practice.
The Hirose U.S. Pat. No. 4,794,528 describes a pattern matching method which converts an n-ary data tree into a vector of constant-length memory cells by placing, into each cell, in order of transverse search, the value of a node preceded by its position. This method is inappropriate for representing hierarchical dependencies of a database for several reasons. First, converting an entire database into one vector is impractical. Second, many values, such as names and addresses, are character strings of varying, multiple word length. Third, attributes do not generally form an n-ary tree. Finally, arranging the values in order of breadth-first search of the tree confers no obvious advantage for representing dependencies.
The Lowry U.S. Pat. No. 4,864,497 describes a common data structure for access by several application programs. The hierarchical storage management technique of the Lowry patent does not address the problem of functional dependencies.
The Galkowski U.S. Pat. No. 4,803,651 describes a document comparison method for encoding, as two separate lists, a hierarchy representing a formatted document.
The Potter U.S. Pat. No. 4,733,354 describes a hierarchical database organization to facilitate medical diagnosis.
Whatever the precise merits, features and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention.