Predefined-schema data storage systems, such as relational databases such as SQLServer from Microsoft and Oracle from Oracle, for example, may be based on a relational data model in which the schema of the data model is defined in advance of use (i.e., at “data definition” time, which is separate from and earlier than “data manipulation” time). Applications written to use these systems must be aware of the schema, and changes to that schema are very disruptive because the model is predicated on “data definition” being advance of “data manipulation.”
There are, however, many applications—such as messaging systems like Exchange from Microsoft Corp, that are not able to describe their full schema in advance of use because the schema (e.g., the “MAPI” data model in the case of Exchange) may be modified at any point in time by any user. For example, it may be known that all “Messages” have a “property” called “PR_SUBJECT” (0x0037001E), but it might not be known that certain messages will be annotated with a new “property” called “PR_FOOBAR” (0x8011001F).
Applications have typically taken two different approaches to store such “highly flexible” schema data in less flexible database systems. In a first such approach, the data is stored in XML or blobs in the database, i.e., the ‘schema’ of the data is moved to a higher layer in the software stack. The database management system (DBMS) sees a very simple schema largely containing a single “dynamic schema” object. This allows the schema to be stored and manipulated in a consistent manner but robs the data store of the ability to use knowledge of the schema to optimize operations of the data store. In the absence of advanced database features such as XML data processing this approach is practically unusable for commercial applications because of the performance penalty imposed by eliminating the data store's ability to optimize its operations.
In a second such approach, some of the data, i.e., the portion that is known at “data definition” time, is stored in the data store as normal “not dynamic” schema objects, and the remainder is stored in either a blob/XML, or normalized into a “property/value” pair table. This allows the data store to know about as much of the schema as possible, and consequently to use that knowledge in operations. But it requires the application to be written knowing which parts of the schema as “normal” database objects and which are stored as blob/XML or property/value pairs.
The net result of these approaches is that the application writer must choose between exposing their “part structured and part variable” data store schema through their object model, or having a “fully variable” data schema and loose data store level optimization and processing. Consider MAPI as an example. There is no natural and intuitive way for the writer to know that PR_SUBJECT (0x0037001E) maps to Message.Subject in the object model, and column dbo.MessageView.Subject in the data store schema, while PR_FOOBAR (0x8011001F) is not represented in either the object model or data store schema because it is a “new” data property added dynamically.
It would be desirable, therefore, if a mechanism existed via which applications could be written such that the data store can have a “part structured and part variable” data schema. It would be particularly desirable if such a schema were not exposed through the object model.