A database management system (DBMS) is a computerized system that stores and manipulates data. A DBMS provides software and/or end-user interfaces for manipulating data according to one or more data models, e.g., U.S. Pat. No. 5,713,014, column 1, lines 17-18).
A data model (also called a conceptual or semantic model) is a collection of concepts that can be used to describe the data types, relationships and constraints that should hold on the data. Data models strive to capture and represent certain aspects of human perceptions of the real world so that these aspects can be incorporated into an information system, Y. Wand, pages 494-495.
Most data models are concerned with things, which are often referred to as entities, and associations among things, often referred to as relationships, see Y. Wand, et al, pg. 494-495. A thing in the real world, like a person, has properties, like age, weight, gender, which are often represented by attributes in a data model. Certain properties imply the existence of other entities. For example, the property of being employed implies the existence of an employer. Such properties are called mutual properties, and are one way of describing an association between the two things, in this case a person and an employer. In contrast, the property of weight does not imply the existence of anything else; see Y. Wand, page 498.
Common data models include: the relational model, the object model, the entity-relationship (E-R) model, and the extended entity-relationship (EE-R) model. Other data models include the network model and the hierarchical model.
The relational model was introduced by Codd (1970) and has one modeling concept, the relation, which is a two dimensional table containing data. This data model is the basis for the data definition and structured query language, SQL, which itself the basis for most commercially available DBMS″s. In the relational model each row of a table corresponds to an entity and each column of the table contains the value for a particular attribute. Relationships between entities are represented by adding columns to one table that contain values uniquely identifying a row in another table. A table row, which in the relational model corresponds to an entity, is often informally called a “data record” or “record”. The relational model is described at length in most database textbooks, including Elmasri, Ullman, and Date.
An object model is defined by the Object Design Language (ODL) (ODMG 3.0 pages 61-78) and is also discussed in Ullman, pages 25-77. The E-R model is defined by a graphical notation called an E-R diagram, described in Elmasri page 63 and 73-93. The EE-R model augments the E-R model with modeling concepts for super/sub class relationships and predicate defined unions of entity classes. A formal set theoretic definition of EE-R is given in Elmasri pages 92-93. The network model and hierarchical model are widely accepted to be special cases of the E-R model, and are described in Ullman, pages 77-80.
Data models are often compared on the number of modeling concepts they support and the restrictions they impose on the use of those modeling concepts. An example of a modeling concept is the table in the relational model, or an entity in the E-R model. An example of a modeling restriction is the E-R models rule that all entities represented by a symbol at a particular location on an E-R diagram must belong to an entity type. An entity type determines the attributes the entity may have. This implies that in any E-R data model, if an entity has one of the attributes of the entity type, it must have all the attributes defined by the entity type.
In terms of modeling concepts and constraints, the object model and E-R model are considered nearly identical in the sense that every modeling concept in the object model can be duplicated by some combination of modeling concepts in the E-R model and vice versa. The EE-R model contains additional modeling concepts that match the object model concepts more directly, and some argue the EE-R model contains at least one modeling concept, the entity-union type, that cannot be easily duplicated in the object model, Elmasri, page 399.
Even though the relational model has only one modeling concept: the table, both the object and EE-R modeling concepts can by mapped to some combination of table definitions of the relational model. Chapter 9 in Elmasri and chapter 3 in Ullman cover this process in detail. Since the relational model is implemented by most commercial DBMS″s, the relational model is also considered to be an implementation model. Since the object model as described by ODL is implemented by several commercial OO-DBMS″s, the object model is considered to be an implementation model as well. The E-R and EE-R models are not presently known to have commercially available management systems, and therefore are not considered to be implementation models.
There are several limitations of the existing data and implementation models. One limitation of existing data and implementation models is the constraint that attributes are grouped by tables (in the relational model) or by entity types (in the E-R model). This limitation implies that if an entity has a value for one of the attributes in the group it must have values for all the attributes in the group. Take as an example a data model for a person. A person may have attributes of name, age, and weight. If a person is employed, the data model would contain attributes of employer and salary. If a person is a student, then the data model may contain attributes of school name, year in school, and study major. If we use a single table to hold all of these attributes, then each row in the table will require values for all the attributes, even if the particular row represents a person who is not a student, or a person who is not employed. In the E-R model and object models, each entity would similarly require values for all the attributes.
In these conventional models, the solution is to create three table or entity definitions, one containing the name, age, and weight, another containing employer and salary, and a third containing school name, year in school, and major. The person, employer and school tables or entity definitions would require additional link attributes that would allow an application program using the database to locate the relevant employee or student information for a particular person. One of the consequences of this limitation is the added complexity in deciding the correct partitioning of attributes into various tables or entity definitions in the conceptual model of the database, and in the added complexity of extracting and manipulating the data in application programs.
Because individual things in the real world are mapped to multiple entities in the E-R model or rows in multiple tables of an SQL DBMS, restoring a database representation of a single person including any employee or student information from a backup copy of a database is usually not possible. Restoration may be required due to an operator error that destroyed a particular data record. A restore operation would require an understanding of the relationships between the rows in the various tables representing a person to be able to restore the information for a single person. Restore operations typically do not have this capability. Restore operations typically reset the state of a whole table, not individual rows; but, since data in one table is dependent on data in other tables, restoring one table without the others will most often result in an database inconsistent with its design rules. Therefore SQL and object databases typically support only the backup and restore of a whole database, a time and resource consuming operation.
Another conventional implementation model limitation is the absence of any conceptual model for restrictions on values of attributes in the database. None of the models described above have any representation for the fact that an age should be between 0 and 120, or if an attribute “vehicle model” of a particular entity has value “Crown Victoria”, then an attribute “vehicle manufacturer” of the same entity must be “Ford”. Rather, in E-R, SQL and object models, the determination of permitted values is handled by the logic of a filtering program handing data before entry into the database or by an external data manipulation program which manipulates the data following retrieval and before presentation to the user. The legal conditions of the data are stored external to the database and therefore must be embodied in the external filtering or data manipulation program. This limitation leads to the prospect that two separate filtering or manipulation programs will operate inconsistently with respect to each other, and adds to the complexity of implementing and modifying such programs.
Another prior art limitation is the difficulty of modifying a data model in an operational database application. Many changes to the data model, such as adding or modifying attribute definitions or adding or modifying relationships between entities, require shutting down the operational database, unloading the affected data, re-engineering the data definitions, re-engineering and recompiling the application programs, and reloading the data into the redefined database. An example of a typical change that would require these steps is changing the data model from allowing one e-mail address per person to allowing multiple e-mail addresses per person. The main reason for this difficulty is that implementation of an attribute that may have multiple values requires additional table definitions in SQL and requires a different attribute definition in the object model.
Prior database schemas and management systems which are considered pertinent to the present invention include U.S. Pat. No. 4,479,196 to Ferrer et al, entitled: “Hyperedge Entity—Relationship Database Systems”; and, U.S. Pat. No. 5,713,014 to Durflinger et al., entitled: “Multi-Model Database Management System Engine for Database Having Complex Data Models.