N/A
N/A
1. Field of the Invention
This invention pertains in general to data management, and in particular to a system and method of storing in one relational database with a fixed number of physical tables, data from an arbitrary number of arbitrarily designed relational schemas in such a manner that the structure of the database is unaffected by any changes in the structures of the schemas.
2. General Background and Prior Art
Conventional implementations of relational data models map logical entities to physical tables with fixed structures. A major disadvantage of such implementations is the isomorphism of logical constructs and physical objects. Changes in the logical data model, such as the addition and deletion of entities and attributes, require a restructuring of the database. Systems of storage have been devised that attempt to reduce the effects of schema evolution on the storage system. One strategy, to which the present invention belongs, is abstraction, a technique that treats metadata (such as table names like xe2x80x9cEmployeesxe2x80x9d and column names like xe2x80x9cPayroll_Numberxe2x80x9d) as data that can be manipulated at runtime. A review of prior art shows that current inventions using metadata abstraction are only partially independent from the architecture of the logical model; with physical tables still defined in terms of business- or content-specific constructs. Another shortcoming of prior art is the failure to address matters pertaining to data integrity policies, such as primary key, foreign key, unique key and the like as well as data access and security policies, which are of paramount importance in transaction-oriented commercial applications.
3. Statement of Need
Network and hierarchical data aggregationsxe2x80x94such as parts of a complex machinery like an aircraft, network of telecommunication devices, components of networked computer systems, corporate assetsxe2x80x94typically are characterized by a high degree of volatility. Component attributes can and do change. Likewise, the relationships among components also undergo many changes. Components are attached and detached from other components, or moved around from one location to another. New components are introduced, old ones retired. Such events typically require database restructure. Other data aggregations have sets of record types that for practical purposes may be considered unbounded, for example, intelligence data about contacts, customers, and competitions. One cannot identify at design time the full set of entities and attributes that will be of interest to users of such systems. When implemented using physical tables to represent entities, such systems require too much maintenance effort to catch up to user needs. Soon users are using unstructured multi-subject fields like xe2x80x9cNotes,xe2x80x9d xe2x80x9cComments,xe2x80x9d or xe2x80x9cRemarksxe2x80x9d to store otherwise structured data.
Of immediate interest is the problem posed by the need for a better way to store the data contained in structured documents, for example, XML documents. With the increasing use of XML documents as the preferred container of structured data for transmission in electronic transactions, the need has become apparent for a database system that is fully independent of the structural shapes of logical data models. While XML data inside XML documents can be processed, businesses still prefer to store structured data in relational tables for storage efficiency and ease of manipulation. Storing XML data in relational tables requires prior creation of matching tables. This can be impractical. An XML document contains its own logical data model (as specified in its data type definition or schema) that describes the elements of the documents and their relationships. The data model may change any number of times in the lifetime of the document. Processing an XML document that suddenly has a modified schema, under a system that stores classes of facts in separate fixed tables, requires database restructure. The essence of self-describing documents is the removal of the need for a preliminary handshake between document publisher and consumer. With self-describing documents, the expectation is that the consumer can comprehend the structure of the document by reading its schema. This convention permits unannounced modifications to the document data model.
What is needed is a dynamic storage system whose physical structure is independent of the architecture of the schemas that underlie the stored databases such that changes in the schemas do not require changes in the structure of the storage system. The present invention satisfies this need.
This invention is a method of storing data values of arbitrarily defined record types in a single fact table and using a system of indexes to retrieve record instances. The invention provides a single database storage system for any number of databases and one whose physical configuration is unaffected by changes to the architecture of the underlying data models. This storage system is referred to hereinafter as xe2x80x9cUniversal Database Storage Systemxe2x80x9d or xe2x80x9cUDSSxe2x80x9d.
To achieve this objective, the UDSS schema operates at the meta-metadata level with constructs like xe2x80x9cElementsxe2x80x9d, xe2x80x9cEntitiesxe2x80x9d, xe2x80x9c(Entity) Compositionxe2x80x9d, xe2x80x9c(Entity) Occurrencesxe2x80x9d, and xe2x80x9cData Valuesxe2x80x9d, and treats metadata such as xe2x80x9cEmployeesxe2x80x9d and xe2x80x9cEmployee Namexe2x80x9d as data, alongside with data values like xe2x80x9cJohn Jonesxe2x80x9d, xe2x80x9cJun. 1, 2000xe2x80x9d, and the like. Operating at this abstract level makes it possible to limit the number of physical tables to at most five, as follows:
Four structural tables:
The Elements table, which contains the inventory of elements metadata for all the implemented data models, consisting of two element types: simple elements (also referred to as xe2x80x9cfieldsxe2x80x9d) and complex elements (also referred to as xe2x80x9crecord typesxe2x80x9d), and attributes that apply to both or either of the element types;
The Entities table, which contains record types metadata for all the implemented data models, and attributes global to instances of record types (this table is required only if at least one entity can have more than one instance, for example, if entity versioning is required);
The Composition table, which specifies the elements that compose record types for all the implemented data models, and specifies attributes specific to instances of record types;
The Occurrences table, which contains information about occurrences (equivalent to the rows of a conventional relational table) of instances of record types for all the implemented data models; and
One fact table:
The Data Values table, which contains the values of the fields of occurrences of instances of record types for all the implemented data models.
As mentioned, an element may be a record type or a field type. A record type may consist of fields alone, or of other record types alone, or a combination of fields and record types. Each record type has a corresponding record in the Entities table. Each entity or record type in the Entities table has associated with it a number of elements, and this element-of relationship is specified in the Composition table. Each record in the Occurrences table contains information about specific occurrences of an instance of a record type. The field values of an occurrence of an instance of a record type are stored individually as rows in the Data Values table.
Each row in the Data Values table consists of a data value field, an Occurrences key value field that relates the field to a unique record in the Occurrences table, and a Composition key value that relates the field to a unique row in the Composition table. These two key values make possible the retrieval of structural information relating to the field and the record type itself as contained in the Composition, Entities, and Elements tables.
In essence, UDSS implements a relational database as a system of indexes to a single fact table. With a system of indexes rather than a set of physical tables, the addition, deletion, and modification of tables do not require physical restructuring of the database. A new application data table is added by simply defining it as a record type in the UDSS Elements definition table. The new data table""s fields are entered in the UDSS Elements definition table as simple or complex elements. The structure of the data table is then defined in the Composition table. Deletion of a data table simply requires deletion of the appropriate rows in the UDSS tables. Similar processes are involved in adding or deleting fields to a data table.
UDSS is an ideal storage system for structured data contained in self-describing structured documents like XML documents. In a non-UDSS type of storage, to store XML documents in permanent storage to support extensive data manipulation processes, a common practice is to fit data into physical tables. This presupposes the existence of appropriately defined tables. Thus, new document types cannot be processed until tables are first created. Similarly changes in the schema of previously recognized document types require database changes before the document can be processed. A UDSS-based storage is impermeable to such structural changes. The XML document schema directly parses into the Elements, Entities, and Composition tables. The body of the XML document supplies information on occurrences of record types, and provides the data values.
UDSS has conceptual similarities to a multi-dimensional database (MDDB). Both have a fact table. Both have pointer tables whose function is to provide index values to appropriate rows in the fact table (though this is only one of the functions of the UDSS structural tables). A major difference is in the nature of the physical storage system. The dimension tables of an MDDB are separate tables, and different fact types are stored in different fact tables. As more dimensions and fact types are brought into play in an MDDB, the physical system increases in complexity. A simple star schema grows into a constellation of star schemas. In contrast, the number of UDSS tables remains fixed at five no matter how many fact sets are added to the database. Indeed, an MDDB can be converted into a UDSS by folding the dimension tables into the four structural tables of a UDSS, and the fact tables into the single Data Values table.
N/A