User-Defined Types
As known and appreciated by those of skill in the art, MICROSOFT SQL SERVER is a comprehensive database management platform that provides extensive management and development tools, an extraction-transformation-loading (ETL) tool, business intelligence and analysis services, and other capabilities. In addition, the MICROSOFT WINDOWS .NET Framework Common Language Runtime (CLR) was recently integrated into the SQL SERVER database.
The CLR is the heart of the MICROSOFT .NET Framework, and provides the execution environment for all .NET code. Code that runs within the CLR is referred to as “managed code.” The CLR provides various functions and services required for program execution, including just-in-time (JIT) compilation, allocating and managing memory, enforcing type safety, exception handling, thread management and security. The CLR is now loaded by SQL SERVER upon the first invocation of a .NET routine.
In previous versions of SQL SERVER, database programmers were limited to using Transact-SQL when writing code on the server side. Transact-SQL is an extension of the Structured Query Language (“SQL”) as defined by the International Standards Organization (ISO) and the American National Standards Institute (ANSI). Using Transact-SQL, database developers can create, modify and delete databases and tables, as well as insert, retrieve, modify and delete data stored in a database. Transact-SQL is specifically designed for direct structural data access and manipulation. While Transact-SQL excels at structural data access and management, it is not a full-fledged programming language as are VISUAL BASIC .NET and C#. For example, Transact-SQL does not support arrays, collections, for-each loops, bit shifting, or classes.
With the CLR integrated into the SQL SERVER database, database developers have been able to perform tasks that were impossible or difficult to achieve with Transact-SQL alone. Both VISUAL BASIC .NET and C# are modem programming languages offering full support for arrays, structured exception handling, and collections. Developers can leverage CLR integration to write code that has more complex logic and is more suited for computation tasks using languages such as VISUAL BASIC .NET and C#. These programming languages offer object-oriented capabilities such as encapsulation, inheritance and polymorphism. Related code can be easily organized into classes and namespaces.
Managed code is better suited than Transact-SQL for number crunching and complicated execution logic, and features extensive support for many complex tasks, including string handling and regular expressions. With the functionality found in the .NET Framework Base Class Library (BCL), database developers have access to thousands of pre-built classes and routines which can be easily accessed from any stored procedure, trigger or user-defined function.
Another benefit of managed code is type safety. Before managed code is executed, the CLR verifies that the code is safe. This process is known as “verification.” During verification, the CLR performs several checks to ensure that the code is safe to run. For example, the code is checked to ensure that no memory is read that has not be been written to. The CLR will also prevent buffer overflows.
When writing managed code, the deployment unit is called an assembly. An assembly is packaged as a dynamic link library (DLL). Managed DLL assemblies can be loaded into and hosted by SQL SERVER. The CREATE ASSEMBLY statement is used to register an assembly in the server. Here is an example:                CREATE ASSEMBLY YukonCLR        FROM ‘C:\MyDBApp\YukonCLR.dll’In this example, the FROM clause specifies the pathname of the assembly to load.        
SQL SERVER has traditionally supported “built-in” scalar types such as integer, floating point number, date, time, and character string. These built-in types also come with a set of built-in operations such as +, −, *, / as well as built-in functions over these types. These types, operations, and functions are “built-in” in the sense that they are implemented and packaged by the product and users cannot define their own types.
It is desirable for a database system, such as SQL SERVER, to allow users to extend the type system of the database system such that users could create new types that act as scalar types in the database system but which contain more complex structure and behavior such as to create a “Point” type consisting of X and Y coordinates, among other things. In this regard, the SQL standard and some database management system (DBMS) products have used the term “user-defined type” to describe several forms of type extensibility. For example, the SQL-99 standard describes a “distinct type,” which is a type that can be defined by a user to have an internal representation that is a value of an existing SQL built-in data type. A distinct type may optionally share comparison and arithmetic operators, type conversions, and aggregate (column) functions (e.g., max, min, average) with an existing scalar type. The distinct type may allow constraints to be defined on its values. In addition, a distinct type may expose behaviors beyond those of an existing scalar type by defining new functions specific to the distinct type via user-defined functions. With respect to type checking, the distinct type and existing scalar types are considered to be different types.
An advantage of distinct types is ease of definition. If the internal representation of the new type has a single data member that can be described with an existing built-in type and the built-in type already implements most of the behaviors required on the new type, then distinct types are an attractive alternative. The user does not have to worry about implementing the behaviors required to manage the on-disk storage of the type, constructors, comparison operators (used for ordering and indexing), arithmetic operators, and type conversion (casting) operators. The user only needs to choose what functionality of the underlying built-in type needs to be exposed on the distinct type and optionally add constraints on values or additional functions on the new type. Another advantage of distinct types is that all of the query processing available for built-in types, such as computation of histograms, can be readily used on columns of distinct types. A disadvantage of distinct types, however, is that they can not easily be used to create more complex types.
The SQL-99 standard also described a “structured type,” which is a type that can be defined by user and that has an internal representation that is a collection of data members, each of which may be of a different SQL built-in or user-defined type. This is similar to the notion of a struct in C and C++. SQL-99 describes a style for defining structured types by which a user only needs to define the type in terms of its internal structure. The system automatically generates accessor and mutator functions on its data members, constructor, and functions to manage the on-disk representation of instances of the type.
Specific advantages of structured types as defined in SQL are (a) ease of definition of basic behaviors of the type, and (b) the flexibility to define more complex types. However, a significant disadvantage of structure types is the complexity of definition of type-specific methods, which are usually defined via external functions written in a general-purpose programming language like C or C++. In order to define a structured type fully, the definer of the type needs to straddle a line between SQL and some other programming language.
While the distinct type and structured type features of SQL 99 offer some advantages to users in terms of enabling them to extend the existing scalar type system of a SQL database, there has been a need for improved systems and methods for enabling a user to extend the scalar type system of a database system through user-defined types that act as scalar types but that contain more complex structure and behavior.
The invention disclosed in U.S. patent application Ser. No. 10/775,282, filed on Feb. 10, 2004, entitled “SYSTEM AND METHOD FOR PROVIDING USER DEFINED TYPES IN A DATABASE SYSTEM,” (hereinafter, the UDT Patent Application) is directed to a system and method that allows a user to extend the scalar type system of a database system by creating user-defined types that act as scalar types but that contain more complex structure and behavior. According to that invention, a user writes program code in a high-level programming language that implements a class that defines the structure of a user-defined type and methods that can be invoked on instances of the user-defined type. As used therein (and herein), the term “structure,” when referring to a user-defined type, encompasses the set of fields or properties that implement the type. The type of each field in the UDT Patent Application could be a scalar SQL type or any previously defined user-defined type. The class defining a user-defined type is then compiled and registered with the database system. Specifically, a CLR class defining a user-defined type may be compiled into an assembly which is then registered with the database system via a CREATE ASSEMBLY data definition statement. After the assembly is registered, a user can register the class within the assembly that defines the user-defined type using a CREATE TYPE data definition statement.
For the invention of the UDT Patent Application, the database system enforces a specific contract that the class must implement to enable the user-defined type to act as a scalar in the SQL type system. The term “contract,” as used therein (and herein), refers to a technique that is used at runtime in object-oriented programming environments to check that code to be executed satisfies certain pre-conditions or requirements to ensure that it will execute properly. According to that invention, the contract against which a class that defines a user-defined type is compared comprises several requirements. First, the class must specify one of a plurality of different formats for persisting instances of the user-defined type in a database store. Second, the class must be capable of returning a null value for the user-defined type. Third, the class must provide a method for converting the user-defined type to and from another type, such as a string type. Once these requirements are satisfied, the database system enables instances of the user-defined type to be created. In one embodiment of that application, the user-defined type can be instantiated as a column value in a table, a variable, a parameter of a routine, or a return value of a routine. The database system stores metadata about the class defining the user-defined type for subsequent use in creating instances of the type. In another embodiment of that invention, the verification of the user-defined type contract is performed using the metadata describing the class that defines the type.
The plurality of different formats for persisting instances of the user-defined type comprises a first format in which an instance of the user-defined type is automatically serialized in accordance with a native format of the database system, and a second format in which an instance of the user-defined type is serialized in a manner defined by the user authored class. Additionally, when the invention of the UDT Patent Application is embodied within MICROSOFT SQL SERVER, in which the MICROSOFT .NET CLR is integrated, a third format is available in which an instance of the user-defined type is serialized in accordance with a method provided by the MICROSOFT .NET Framework.
For the invention of UDT Patent Application, expressions in the query language of the database system can include one or more references to an instance(s) of a user-defined type, such that evaluation of the expression requires invocation of a method on the instance of the user-defined type. When the database system receives such a query language expression, it translates the expression into a sequence of program code instructions that, when executed, invoke the required method on the instance of the user-defined type. The database system then returns the result of the method invocation as the result of evaluation of the query language expression. In one embodiment, the instance of the user-defined type is deserialized prior to invoking the method on the instance.
Another feature of the that invention is the ability to change the value of an instance of a user-defined type through invocation of a mutator method. Specifically, the author of the class that defines the user-defined type includes a mutator method as part of the class. When invoked on an instance of the user-defined type, the mutator method enables a value of the user-defined type to be changed. This process may comprise deserializing the instance of the user-defined type, invoking the mutator method to change the value of the deserialized data of the instance, and then serializing the modified instance of the user-defined type to persist the changes.
Another feature of that invention is that the class defining a user-defined type may further contain an attribute that specifies that serialized binary representations of instances of the user-defined type will be binary ordered. This allows binary comparisons to be made on instances of the type and also enables indexing to be performed on instances of the type. Specifically, for instances of a user-defined type that are binary ordered, when a query language expression that requires some comparison of two instances of the type is received by the database system (e.g., >, <, or =), the serialized binary representations of the two instances can be used to evaluate the expression, without deserializing either instance. Additionally, for a user-defined type that is binary ordered, a table in the database store can be created that has a column defined as the user-defined type. An index can then be created on the column. An index can also be created over a query language expression that references the user-defined type. In this case, a computed column is first generated over the expression, and then an index is created over the computed column.
Collections
Data-intensive applications use object data models to capture portions of the real-world (also known as “object data modeling”). Entities (e.g., “Items” in an Item-based hardware/software interface system) and relationships are two common data modeling constructs. For example, if Author and Document are two types of entities, there may be several relationships between Authors and Documents such as “all documents written by an author” and “all documents reviewed by an author,” to name a few.
An entity contains a set of properties which may be scalar-valued and/or collection-valued (or set-valued). For example, an Author entity may contain scalar-valued properties for the name and birthday of the author, as well as collection-valued properties for a set of addresses as well as a set of phone numbers for the author. These collection-valued properties, like the scalar-valued properties, are embedded in the entity—that is, they are considered to be part of that entity and not just a defined relationship from the entity to another entity. Each of these collections (a.k.a. “embedded collections”) usually contain a small number of members (or “elements”), and in some instances the members of these collections may have a relative order of importance (e.g., when an author has a primary address, a secondary addresses, and so forth). As used herein, an unordered collection is referred to as a “multiset” while an ordered collection is referred to as a “list,” the latter being an extension of the former that further account for relative order of the members within the collection.
Extendable Relational Data Store
Although collections have been implemented in existing database products, these products commonly model collections using value-based relationships such as primary key-foreign key concepts and, as such, cannot be treated as relational queriable objects. Moreover, queries to collections—for example, “return all contacts who live in city X”—generally require traversing all of the paths of the collection since said collections are not relational queriable objects.
An extendable relational data store (ERDS) is a relational data store that allows entities to be modeled as user-defined types (UDTs) as set forth in U.S. patent application Ser. No. 10/775,282, filed on Feb. 10, 2004 and entitled “SYSTEM AND METHOD FOR PROVIDING USER DEFINED TYPES IN A DATABASE SYSTEM,” which describes the implementation of UDTs. In an ERDS, it would be more natural and convenient to model these collections as part of a user-defined type (UDT) where the collections UDT could be treated as both a scalar and a relation as appropriate. Therefore, what is needed in the art is method for modeling collections as part of a UDT in an ERDS as well as a collection that can be processed as a scalar or queried as a relation.