1. Field of the Invention
This invention relates to an improved database interface, database management system (DBMS) and database engine employing the "Entity-Attribute" data model. More particularly, it relates to such a database interface, management system and engine that incorporates a data model that corresponds closely to an information organization scheme that a human user would employ naturally. Most especially, the invention relates to such a database interface, management system and engine that is significantly faster in the execution of its data manipulation functions than conventional database management interfaces, systems and engines.
The invention further relates to improvements in interactive data storage and retrieval by computer with a video display terminal, keyboard, one or more direct access mass storage devices such as flexible or fixed disks, a processor, and random access memory. More particularly, it further relates to a method of externally displaying and internally representing computer-stored information which has advantages in: (1) ease of use; (2) ease of learning; (3) simplified combination and separation of databases which were possibly created at different times or by different users; (4) simplified internal manipulation of data; and (5) increased performance. Even more specifically, the invention relates to the combination of the prior-art Entity-Attribute semantic data model or some variant of it with the supporting "Item Space" logical data structure. Most specifically, the invention relates to: (1) the display of Entity-Attribute structured information on an interactive video terminal or on paper in a value-ordered, one-row-per-Attribute mode; (2) the encoding of the connection between an Entity and an Attribute into one or more "composite keys", hereafter called "Items"; (3) the simplified internal manipulation of information so encoded, such as especially the directness of merging separate databases into one or of separating one database into several; (4) the increased performance flowing out of the simplification of the internal manipulation of information so encoded; (5) the increased performance flowing out of the compatibility of information so encoded with the "Engine", which uses an improved B-tree algorithm; (6) the improvements to the B-tree algorithm utilized in the Engine.
2. Description of the Prior Art
A variety of data models are employed in prior art database management systems. The most widely known and employed data models in the prior art are of the hierarchical, network and relational types. The hierarchical model is the oldest of these models. IBM's Information Management System is representative of this type. In this approach, a plurality of subordinate records are organized under a root record, with as many levels as are appropriate. One of the major shortcomings of the hierarchical model is that real world situations frequently do not fit into a hierarchical structure. As a result of constraints imposed by the hierarchical model, such a database contains redundant information in its records, the consistency of which must be maintained manually. Insertions and deletions of some kinds of information produce anomalies, or unavoidable inconsistencies, in the database.
As a result of these and other shortcomings of the hiearchical model, the 1971 Conference on Data Systems and Languages resulted in the CODASYL model, which is the most widely used network data model. In the network data model, database queries follow the data in looped chains to find the requested information. While the network data model essentially eliminates the above difficulties of the hierarchical model, a major problem of this approach is the complexity of the database designs that typically result. Normalization difficulties occur with the network approach as well, which will be explained below in connection with the relational model.
A relational database consists of a series of tables, each table being composed of records of a certain type. The intuitiveness and simplicity of the relational model are immediately apparent. These characteristics give the relational model much of its appeal. Most of the important commercially available microcomputer database management systems at the present time are relational databases. One aspect of this model is the complete absence of explicit links between record occurences. This is both a significant strength of relational database management systems because it allows very simple and powerful query languages, and a significant weakness, because it makes relational database management systems notably slow. However, the generality of the model and the increased ease of producing both database designs and query procedures have made the relational model the most popular for recent database management systems.
An area of concern with relational database management systems is normalization. Normalization refers to the degree of semantic correctness in the database design. Consider a simple relational database having only one relation, i.e., Person. The fields of the relation are the person's name, street address, zip code, and child. This is satisfactory as long as the person has only one child. However, the real world situation of more than one child can be handled only by adding another complete instance of relation, with all the fields the same except for the child field. This means that the database is not normalized. The problem with this example is solved by splitting the Person relation into two relations: PersonAddr and PersonKidz. This solves the normalization problem, but creates a new database. Construction of the new database requires enumeration of the entire database, splitting each relation into its new pieces, and even for a simple data model, this can be very expensive in time and storage space.
The lack of normalization, though obvious in the above example, can be subtle in many database applications. Detecting a lack of normalization depends on the database designer's degree of understanding of issues involved in normalization and his or her familiarity with the material to be represented in the database. The degree of difficulty of modifying a relational database after its structure has been redesigned makes what seems like a simple change, adding information to what is already there, a process of creating a new database, into which the contents of the old database are dumped.
A variant of the relational model, called the binary relational model, breaks down the information in the database into the smallest possible pieces at the outset, to avoid normalization problems. This model has two fields: a key and an attribute. The key is used for retrieval and may be called an entity name. When a value is placed in an attribute value field, the result is a data model having entity-attribute-value triples. This model is called the entity-attribute model, and the present invention concerns improvements in that model.
The Entity-Attribute data model has many variants, and there are many systems in use which employ some form of it. Even the LISP programming language has a feature--property lists--which exhibits the fundamental characteristics of an Entity-Attribute system, although the terminology is different. Much of the recent work in the field of Artifical Intelligence has been in developing "knowledge representation languages" in order to encode general knowledge and facts for "expert systems". Knowledge representation languages and systems have proven the descriptive power of the Entity-Attribute or similar models. However, these systems address the needs of programmers and "knowledge engineers" rather than everyday users. The need for a truly simple user view into a database is as urgent as the need for database flexibility and representational power.
Relational model databases abound also. These systems organize data into tables or "mathematical" relations. Unfortunately, the mathematics of relations escapes most everyday users of databases, and the quest for ease-of-use amounts to little more than a tradeoff between representational power and simplicity. For example, relational systems for everyday users rarely allow true relational joins, and many can only use a single table at a time, even though the representational utility of the model fundamentally relies on ability to decompose relations into multiple "normalized" relations.
Idea processors have a superficial similarity to the Item Editor, in that they allow what appears to be highly flexible data structuring. In reality, however, these systems are not databases at all, since they enforce no formal semantics, at least as "understood" by the idea processor. Instead, they merely serve as indexing methods for collections of "snippets" of text, or, even more simply, as improved text editors which can selectively hide certain levels in a user-defined "outline" hierarchy.
The value-ordered one-row-per-Attribute display, or "Item Editor", allows everyday users to construct and edit fully general Entity-Attribute databases in much the same way as they would edit text using modern word processors. In fact, an Item Editor scrolls the display "window" up and down over the sequence of Items like a word processor scrolls the display window up and down over a document. The person perusing and editing the single sequence of Items in an Item Editor has a single, uniform visual image to contend with, either through the display or on paper--this contrasts with the non-visual, abstract, inquiry--or view-dependent concept of a database with which relational DBMS programmers and database administrators are familiar.