Data has become an important asset in almost every application, whether it is for example a Line-of-Business (LOB) application browsing products and generating orders or a Personal Information Management (PIM) application scheduling a meeting between people. Applications are increasingly becoming data centric—they plan a significant portion of their design- and run-time experience around querying, manipulating, and presenting data. Many of these applications deal with data that is rich in semantics, like structural integrity, data constraints, relationships between data, and so on. Today's applications expend significant effort in procedural code to preserve the data semantics.
Consider, for example, a LOB application. Typically, such an application deals with Customers, Orders, OrderLines, Suppliers, Products, Employees, Shippers, Invoices, and so on. Each of these notions represents a separate rich data type with a specific structure. For example, the Customer type has things like CustomerID, Company Name, Contact Name, and Address; the Order type has things like OrderID, CustomerID, OrderDate, OrderLines, DueDate, etc. Any of the above may have further requirements, for example Address may require a PostalCode which, when within the USA, must be a zip code that is five characters long, and each character is a digit between zero and nine. In Canada, the PostalCode must be of the form “ANA NAN” where A is a letter and N is a number. When modeling postal codes, it is thus not enough to merely specify that it is a string; additional constraints must be placed on this string to restrict the range of possible values that it can take. Furthermore, there are usually relationships among data. For example, an Order may always have a Customer associated with it; this is a many (Order)-to-One (Customer) relationship. Products and Suppliers bear a many-to-many relationship because multiple products can be supplied by a single supplier, and multiple suppliers can carry the same product.
A data model describes the structure and semantics of, and relationships among, the various pieces of data that an application is interested in. While relational models and systems have been very successful in data management, they have failed to capture the application data models. Traditional client-server applications relegate query and persistence operations on their data to database systems. The database system operates on data in the form of rows and tables, while the application operates on data in terms of higher-level programming language constructs such as classes and rich data types. The impedance mismatch in the data manipulation services between the application and the database tier was tolerable in traditional systems. With the advent of service-oriented architectures (SOA), application servers and multi-tier applications, the need for rich data access and manipulation services that are well-integrated with programming environments and can operate in any tier has increased tremendously.
Most applications and application frameworks roll their own data model on top of relational data model based systems to bridge the impedance mismatch between the data and the application programming environment. This is because most applications, whether LOB, PIM, Information Worker, or otherwise, require data model concepts like rich structure, relationships, behaviors, and extensibility. These data model concepts are not adequately supported by existing data models, and moreover adequate query languages do not presently exist for accessing data if it were to be organized according to a more advanced data model.
Exemplary modern candidates for a data meta-model include the 1999 version of the Structured Query Language (SQL99), the Common Language Runtime (CLR), the Unified Modeling Language (UML) and XML Schema Definition (XSD). However, the CLR is an object-oriented, imperative-programming runtime, and has no native data model or notions of integrity constraints, relationships, or persistence. SQL99 lacks data modeling concepts like relationships, and does not have good programming language integration. The XSD specification does not support concepts like keys, relationships, and persistence, and is complex and has awkward mapping to both the runtime and to relational database models. The UML is too general: it requires application developers to add precise semantics, especially for persistence.
There is an unmet need in the industry for a data model and corresponding support framework that provides better application access to rich data types. There is a further need for an extensible query language with support for rich data types as may be supported by such a data model