1. Field of the Invention
The present invention relates to, and finds utility within, computerized database definition and management. More particularly, the present invention relates to a computerized database management system implementing an Attribute-Refinement model.
2. Background Relevant to the Prior Art
A database management system (DBMS) is a computerized system that stores and manipulates data. A DBMS provides software and/or end-user interfaces for manipulating data according to one or more data models, e.g., U.S. Pat. No. 5,713,014, column 1, lines 17-18).
A data model (also called a conceptual or semantic model) is a collection of concepts that can be used to describe the data types, relationships and constraints that should hold on the data. Data models strive to capture and represent certain aspects of human perceptions of the real world so that these aspects can be incorporated into an information system, Y. Wand, pages 494-495.
Most data models are concerned with things, which are often referred to as entities, and associations among things, often referred to as relationships, see Y. Wand, et al, pg. 494-495. A thing in the real world, like a person, has properties, like age, weight, gender, which are often represented by attributes in a data model. Certain properties imply the existence of other entities. For example, the property of being employed implies the existence of an employer. Such properties are called mutual properties, and are one way of describing an association between the two things, in this case a person and an employer. In contrast, the property of weight does not imply the existence of anything else; see Y. Wand, page 498.
Common data models include: the relational model, the object model, the entity-relationship (E-R) model, and the extended entity-relationship (EE-R) model. Other data models include the network model and the hierarchical model.
The relational model was introduced by Codd (1970) and has one modeling concept, the relation, which is a two dimensional table containing data. This data model is the basis for the data definition and structured query language, SQL, which itself the basis for most commercially available DBMS""s. In the relational model each row of a table corresponds to an entity and each column of the table contains the value for a particular attribute. Relationships between entities are represented by adding columns to one table that contain values uniquely identifying a row in another table. A table row, which in the relational model corresponds to an entity, is often informally called a xe2x80x9cdata recordxe2x80x9d or xe2x80x9crecordxe2x80x9d. The relational model is described at length in most database textbooks, including Elmasri, Ullman, and Date.
An object model is defined by the Object Design Language (ODL) (ODMG 3.0 pages 61-78) and is also discussed in Ullman, pages 25-77. The E-R model is defined by a graphical notation called an E-R diagram, described in Elmasri page 63 and 73-93. The EE-R model augments the E-R model with modeling concepts for super/sub class relationships and predicate defined unions of entity classes. A formal set theoretic definition of EE-R is given in Elmasri pages 92-93. The network model and hierarchical model are widely accepted to be special cases of the E-R model, and are described in Ullman, pages 77-80.
Data models are often compared on the number of modeling concepts they support and the restrictions they impose on the use of those modeling concepts. An example of a modeling concept is the table in the relational model, or an entity in the E-R model. An example of a modeling restriction is the E-R model""s rule that all entities represented by a symbol at a particular location on an E-R diagram must belong to an entity type. An entity type determines the attributes the entity may have. This implies that in any E-R data model, if an entity has one of the attributes of the entity type, it must have all the attributes defined by the entity type.
In terms of modeling concepts and constraints, the object model and E-R model are considered nearly identical in the sense that every modeling concept in the object model can be duplicated by some combination of modeling concepts in the E-R model and vice versa. The EE-R model contains additional modeling concepts that match the object model concepts more directly, and some argue the EE-R model contains at least one modeling concept, the entity-union type, that cannot be easily duplicated in the object model, Elmasri, page 399.
Even though the relational model has only one modeling concept: the table, both the object and EE-R modeling concepts can by mapped to some combination of table definitions of the relational model. Chapter 9 in Elmasri and chapter 3 in Ullman cover this process in detail. Since the relational model is implemented by most commercial DBMS""s, the relational model is also considered to be an implementation model. Since the object model as described by ODL is implemented by several commercial OO-DBMS""s, the object model is considered to be an implementation model as well. The E-R and EE-R models are not presently known to have commercially available management systems, and therefore are not considered to be implementation models.
There are several limitations of the existing data and implementation models. One limitation of existing data and implementation models is the constraint that attributes are grouped by tables (in the relational model) or by entity types (in the E-R model). This limitation implies that if an entity has a value for one of the attributes in the group it must have values for all the attributes in the group. Take as an example a data model for a person. A person may have attributes of name, age, and weight. If a person is employed, the data model would contain attributes of employer and salary. If a person is a student, then the data model may contain attributes of school name, year in school, and study major. If we use a single table to hold all of these attributes, then each row in the table will require values for all the attributes, even if the particular row represents a person who is not a student, or a person who is not employed. In the E-R model and object models, each entity would similarly require values for all the attributes.
In these conventional models, the solution is to create three table or entity definitions, one containing the name, age, and weight, another containing employer and salary, and a third containing school name, year in school, and major. The person, employer and school tables or entity definitions would require additional link attributes that would allow an application program using the database to locate the relevant employee or student information for a particular person. One of the consequences of this limitation is the added complexity in deciding the correct partitioning of attributes into various tables or entity definitions in the conceptual model of the database, and in the added complexity of extracting and manipulating the data in application programs.
Because individual things in the real world are mapped to multiple entities in the E-R model or rows in multiple tables of an SQL DBMS, restoring a database representation of a single person including any employee or student information from a backup copy of a database is usually not possible. Restoration may be required due to an operator error that destroyed a particular data record. A restore operation would require an understanding of the relationships between the rows in the various tables representing a person to be able to restore the information for a single person. Restore operations typically do not have this capability. Restore operations typically reset the state of a whole table, not individual rows; but, since data in one table is dependent on data in other tables, restoring one table without the others will most often result in an database inconsistent with its design rules. Therefore SQL and object databases typically support only the backup and restore of a whole database, a time and resource consuming operation.
Another conventional implementation model limitation is the absence of any conceptual model for restrictions on values of attributes in the database. None of the models described above have any representation for the fact that an age should be between 0 and 120, or if an attribute xe2x80x9cvehicle modelxe2x80x9d of a particular entity has value xe2x80x9cCrown Victoriaxe2x80x9d, then an attribute xe2x80x9cvehicle manufacturerxe2x80x9d of the same entity must be xe2x80x9cFordxe2x80x9d. Rather, in E-R, SQL and object models, the determination of permitted values is handled by the logic of a filtering program handing data before entry into the database or by an external data manipulation program which manipulates the data following retrieval and before presentation to the user. The legal conditions of the data are stored external to the database and therefore must be embodied in the external filtering or data manipulation program. This limitation leads to the prospect that two separate filtering or manipulation programs will operate inconsistently with respect to each other, and adds to the complexity of implementing and modifying such programs.
Another prior art limitation is the difficulty of modifying a data model in an operational database application. Many changes to the data model, such as adding or modifying attribute definitions or adding or modifying relationships between entities, require shutting down the operational database, unloading the affected data, re-engineering the data definitions, re-engineering and recompiling the application programs, and reloading the data into the redefined database. An example of a typical change that would require these steps is changing the data model from allowing one e-mail address per person to allowing multiple e-mail addresses per person. The main reason for this difficulty is that implementation of an attribute that may have multiple values requires additional table definitions in SQL and requires a different attribute definition in the object model.
Prior database schemas and management systems which are considered pertinent to the present invention include U.S. Pat. No. 4,479,196 to Ferrer et al, entitled: xe2x80x9cHyperedge Entity-Relationship Database Systemsxe2x80x9d; and, U.S. Pat. No. 5,713,014 to Durflinger et al., entitled: xe2x80x9cMulti-Model Database Management System Engine for Database Having Complex Data Modelsxe2x80x9d.
One object of the present invention is to provide an attribute-refinement data definition model including three basic modeling constructs: an attribute description, a constraint, and an entity.
One other object of the present invention is to provide a database definition method which has a more descriptive set of modeling concepts and fewer constraints on their use in order to provide a better correspondence to real world data typically encountered in the database design process.
Another object of the present invention is a method for forming databases with entities having potentially unique combinations of attributes such that a particular entity need not have any similarity in attribute description to any other entity within the particular database.
A further object of the present invention is a database definition method which causes restrictions on attribute values to become an intrinsic part of the database so that the database itself may guarantee consistency irrespective of characteristics of any particular filter program used to add or modify data in the database.
One further object of the present invention is to provide a database definition method capable of enabling changes and additions in data definitions over time without requiring revision of any preexisting data definitions.
One more object of the present invention is to provide a database management system which enables data entities of a database to be replicated and/or restored independently of other entities.
Another object of the present invention is to provide an OTree clustering and storage method which guarantees the storage of an entity, its attribute values, relationships and indexing information on a contiguous logical block of a computer disk drive, a disk drive array, or other persistent storage system, if a sum of the number of bytes required by these various entity components is less than the number of bytes of a logical block. In addition, the OTree clustering and storage method results in a storage utilization of approximately 50 percent or better.
Another object of the present invention is to provide an application which provides a file system interface to entities stored in an attribute-refinement (A-R) data base management system (DBMS). The file system application stores files as A-R entities, and stores the data associated with a file as an attribute value of the A-R DBMS entity. The A-R DBMS file entities may posses other attribute values as well, and the file system application enables searches of the entities by these other attributes. A directory or folder in the file system application may correspond to the entities that satisfy a particular search criterion, and new files created in such directories and folders may automatically acquire attributes with values that satisfy the search criterion.
Another object of the present invention is a client/server embodiment of an A-R DBMS in which the OTree clustering and storage system resides on one computer system, and the application programmer interface, the entity manager, the constraint manager, the attribute description manager, the entity validator, and the search optimizer reside on a plurality of other computer systems connected to the OTree clustering and storage system via a computer network.
In accordance with principles of the present invention, a computerized attribute-refinement database method is provided for storing, retrieving and modifying data occupying data storage space comprising a database. The method comprises steps of locating available data storage space for storing new entities, reclaiming previously occupied data storage space upon deletion of an existing entity or deletion or reduction of an existing entity""s attribute or attributes, and locating available data storage space when an entity acquires an additional attribute or when an entity""s attribute is extended in length; locating available data storage space for storing multiple attribute descriptions, and reclaiming previously occupied data storage space after deletion or reduction of an attribute description; locating available data storage space for storing constraints where a constraint comprises a list of statements, reclaiming previously occupied data storage space upon deletion of a constraint, and locating available data storage space when an additional constraint is added; and, verifying consistency of entity attribute data upon input or modification with applicable constraints stored in the database.
In accordance with further aspects and principles of the present invention, an OTree clustering and storage schema provides general purpose data clustering and retrieval and is characterized by a property that all related components whose combined size is approximately less than a logical block size reside on the logical block containing the component packet and will therefore only require one storage array access for retrieval; a property that related components whose combined or individual size exceed a logical block require at most one additional storage array access, in short, the property that the number of storage accesses rises proportional to the size of the component; a minimum storage utilization of approximately 50%; and, a property that component data may itself be further partitioned into keyed data elements or keyed (nested) component packets, where the clustering of the keyed data elements or nested component packets obeys the properties described above, and that the access, insertion or deletion of any keyed data element or nested component packet will benefit from the efficiencies of keyed search structure algorithms, for example B+trees (Comer, 1979).
The foregoing objects and aspects of the present invention will be more fully understood and appreciated by those skilled in the art upon consideration of the following detailed description of preferred embodiments presented in conjunction with the accompanying drawings.