This invention relates to databases in general and to database schema in particular.
A database represents a snapshot of the real world at some point in time. As the real world changes over time, so does the database schema, which is a description of the logical structure of the database. For example, engineering design applications require considerable flexibility in dynamically defining and modifying database schema, e.g., class inheritance structure and specifications of attributes and methods requiring application shutdown. This is necessary because design is an incremental process and evolves over time. Schema evolution is the ability to incorporate changes in the database schema while preserving existing information in the database. This is an important issue in a heterogeneous database environment. In such environments, a set of schemata may be used to model the underlying (federated) databases. Changes to one or more databases could result in cascading changes to one or more of the schemata. This in turn could trigger changes to other databases and their schemata. It is therefore very important to manage schema evolution, which includes identifying and propagating changes to the schemata and mapping these back to the set of underlying databases.
Major relational-database vendors support SQL-92 compliant Structured Query Languages (SQLs). In an SQL, schema management is handled by an ALTER TABLE command. The ALTER TABLE command provided by the Microsoft Access database is illustrative thereof:
ALTER TABLE TableName
ADD COLUMN ColName ColType [(size)] [(Single-ColumnConstraint] |
DROP COLUMN ColName |
ADD CONSTRAINT [Multi-Column Constraint] |
DROP CONSTRAINT [Multi-ColumnConstraint]
Relational databases store data in strictly tabular format. Consequently, the schema information is primarily stored in the data dictionary. When schema changes occur, the entire table is locked, and the data in each record are modified to meet the new schema. The length of time for which the table is locked depends on the number of records or rows in the table. For a huge table, the schema change process can take hours or even days. Users have to wait for the operation to complete before they can use the table again.
Even though database vendors provide support for schema changes via SQLs, their approaches are typically vendor-specific and only partial. For example, Informix provides a schema evolution approach that appends schema modifications to a list, and these modifications are then accessed and implemented during access of the impacted records. But Oracle requires that the database system be shut down and its tables be changed to reflect any modifications to the schema. And Versant, an object-oriented database, provides a command xe2x80x9csch 2dbxe2x80x9d to change a schema. Some database-application projects also implement their own approaches to schema changes; one known project uses a high-level approach grafted on to a commercial relational database that uses multiple tables to describe stored data, and involves modifying front mapping tables and altering data tables of corresponding data records. In general, applications developers must write different database records for different vendors"" databases or different individual databases. Consequently, application programming is complex and difficult, and the resulting applications are complex, difficult to maintain and upgrade, and not portable between different databases.
This invention is directed to solving these and other problems and disadvantages of the prior art. Generally according to the invention, there is defined a new database schema that is implementable on most if not all relational and object-oriented databases and that replaces their conventional schemata. This generic schema illustratively uses a tree structure to represent relationships among data attributes, and uses container data structures (nodes) to represent individual attributes (the branches) as collections of all of their instances (leaves) within the tree. All database items use this same one schema, including mapping nodes and root nodes. Hence, the same access and processing procedures can be used on all database items. This schema provides a unified way to handle schema changes independently of the underlying database. When a schema modification happens, a dictionary (in the case of object-oriented databases) or an ID (in the case of relational databases) of a database item is modified, and a container is added or deleted. However, the change is not made at the leaf level, and so the time for which users are locked out from using the database while the change is being effected is minimized. The schema does not require vendor-specific routines, multiple mapping-node modifications, or multiple data-node alterations, to handle schema changes. The solution works across all relational and object-oriented databases known to the inventors. Portability of data among databases is thus facilitated.
Specifically according to the invention, an apparatus comprises a memory that stores a database having a schema that includes the following. A plurality of leaf nodes, each representing an instance (e.g., defining a value) of an attribute. A plurality of branch container nodes, each corresponding to a different attribute and identifying those leaf nodes that represent instances of the container node""s attribute. At least one root node, each representing a database record and identifying a leaf node of at least one container node, which leaf node represents an instance of an attribute of the root node""s record. A plurality of mapping nodes, each corresponding to a different said attribute and identifying the branch container node that corresponds to said attribute. And a map container node, identifying the plurality of mapping nodes. It preferably also includes a root container node, representing the database and identifying each of the root nodes. Each leaf and root node is represented by a data structure that defines a name of an attribute, an instance (a value) of the named attribute, and an identifier of nodes that represent instances of attributes of the named attribute""s instance. Each identification preferably comprises either an Identifier value (in the case of a relational database) or a dictionary of names of the attributes of the named attribute""s instance and corresponding pointers that point to the nodes that represent the instances of those attributes of the subject instance (in the case of an object-oriented database). The apparatus preferably further includes instructions stored in the memory for execution by a processor, to cause the processor to modify the schema, for example by adding and/or deleting attributes to or from the schema, and to add records to, delete records from, and/or modify records of, the database.
These and other features and advantages of the invention will become apparent from the following description of an illustrative embodiment of the invention considered together with the drawing.