Microsoft SQL SERVER is a comprehensive database management platform that provides extensive management and development tools, a powerful extraction, transformation, and loading (ETL) tool, business intelligence and analysis services, and other capabilities. Two improvements to SQL SERVER have recently been implemented. First, the Microsoft Windows .NET Framework Common Language Runtime (CLR) has been integrated into the SQL SERVER database, and second, a new object, referred to as a User Defined Type (UDT), can now be created with managed code in the CLR environment and persisted in the database store.
The CLR is the heart of the Microsoft .NET Framework, and provides the execution environment for all .NET code. Thus, code that runs within the CLR is referred to as “managed code.” The CLR provides various functions and services required for program execution, including just-in-time (JIT) compilation, allocating and managing memory, enforcing type safety, exception handling, thread management and security. The CLR is now loaded by SQL SERVER upon the first invocation of a .NET routine.
In previous versions of SQL SERVER, database programmers were limited to using Transact-SQL when writing code on the server side. Transact-SQL is an extension of the Structured Query Language as defined by the International Standards Organization (ISO) and the American National Standards Institute (ANSI). Using Transact-SQL, database developers can create, modify and delete databases and tables, as well as insert, retrieve, modify and delete data stored in a database. Transact-SQL is specifically designed for direct structural data access and manipulation. While Transact-SQL excels at data access and management, it is not a full-fledged programming language in the way that Visual Basic .NET and C# are. For example, Transact-SQL does not support arrays, collections, for each loops, bit shifting or classes.
With the CLR integrated into the SQL SERVER database, database developers can now perform tasks that were impossible or difficult to achieve with Transact-SQL alone. Both Visual Basic .NET and C# are modem programming languages offering full support for arrays, structured exception handling, and collections. Developers can leverage CLR integration to write code that has more complex logic and is more suited for computation tasks using languages such as Visual Basic .NET and C#.
In addition to CLR integration, SQL SERVER also adds support for User Defined Types (UDT)—a new mechanism that enables a developer to extend the scalar type system of the database. UDTs provide two key benefits from an application architecture perspective: they provide strong encapsulation (both in the client and the server) between the internal state and the external behaviors, and they provide deep integration with other related server features. Once a UDT is defined, it can be used in all the contexts that a system type can be used in SQL SERVER, including in column definitions, variables, parameters, function results, cursors, triggers, and replication.
The process of defining a UDT on a database server is accomplished as follows:                a) create a class in managed code that follows the rules for UDT creation;        b) load the Assembly that contains the UDT into a database on the server using the CREATE ASSEMBLY statement; and        c) create a type in the database using the CREATE TYPE statement that exposes the managed code UDT.At this point, the UDT can be used in a table definition.        
When a UDT definition is created in managed code, the type must meet the following requirements:                a) it must be marked as Serializable;        b) it must be decorated with the SqlUserDefinedTypeAttribute;        c) the type should be NULL aware by implementing the INullable interface;        d) the type must have a public constructor that takes no arguments; and        e) the type should support conversion to and from a string by implementing the following methods:                    1. Public String ToString( ); and            2. Public Shared <type> Parse (SqlString s).                        
Co-pending, commonly assigned, patent application Ser. No. 10/692,225, filed herewith, entitled “System And Method For Object Persistence In A Database Store”, which is hereby incorporated by reference in its entirety, describes another feature of UDTs in which the fields and behaviors of a CLR class definition for a UDT are annotated with storage attributes that describe a layout structure for instances of the UDT in the database store. Specifically, each field of a CLR class that defines a UDT is annotated with a storage attribute that controls the storage facets of the type, such as size, precision, scale, etc. In one embodiment, this is achieved by annotating each field with a custom storage attribute named SqlUdtField( ). This attribute annotates fields with additional storage directives. These directives are enforced when the object is serialized to disk. In addition, every managed behavior (e.g., a method that can be invoked on the UDT object, for example, to return the value of a field) defined in the CLR class is annotated with an attribute that denotes an equivalent structural access path for that managed behavior. In one embodiment, the custom attribute used for this purpose is named SqlUdtProperty( ), and the database server (e.g., SQL SERVER) assumes that the implementation of properties annotated with this custom attribute will delegate to a field specified as part of the attribute definition. This lets the server optimize access to the property structurally without creating an instance and invoking the behavior on it.
FIG. 1 is an exemplary code listing of a CLR class that defines a UDT. As shown, the CLR class has been annotated with the SqlUdtField( ) and SqlUdtProperty( ) custom attributes as described above. Specifically, the SqlUdtField( ) custom attribute has been added at lines 5, 8, 37, and 49 to annotate the respective fields of the exemplary UDT class definition. The SqlUdtProperty( ) custom attribute has been added at lines 11 and 24 to annotate the respective managed behaviors of the class.
The CLR class that defines the UDT is then compiled into a dynamic link library (dll). An Assembly containing the compiled class may then be created using the following T-SQL script commands:                create assembly test        from ‘c:\test.dll’        go        
The following T-SQL script commands may then be used to create the UDT on the server:                create type BaseItem        external name [test]:[BaseItem]        go        
Once the UDT has been created on the server, a table (e.g., “MyTable”) can be created defining an attribute of the table as the UDT type, as follows:                create table MyTable        (                    Item BaseItem,            ItemId as Item::ID                        )        go        
A new item can be added to the table, as follows:                declare @i BaseItem        set @i=convert(BaseItem,”)        insert into MyTable values (@i)        go        
The UDT expression can then be used in a query such as: SELECT Item.ID, Item.Name FROM MyTable.
With the integration of the CLR into SQL SERVER and the ability to define UDTs from a class definition in managed code, applications can now instantiate objects of the type defined by the managed code class and have those objects persisted in the relational database store as a UDT. Moreover, the class that defines the UDT can also include methods that implement specific behaviors on objects of that type. An application can therefore instantiate objects of a type defined as a UDT and can invoke managed behaviors over them.
When an object of a class that has been defined as a UDT is instantiated in the CLR, the object can be persisted in the database store through the process of object serialization, wherein the values of the variables of the class are transferred to physical storage (e.g., hard disk). FIG. 2 illustrates the serialization of an object in memory to its persisted form on disk. The object may be persisted in the database store in a traditional relational database table of the format illustrated in FIG. 3. As shown, the table comprises a column of the specified UDT. The serialized values of a persisted object of the specified UDT occupy a cell of the UDT column.
Referring again to FIG. 2, when an application generates a query that includes a predicate or an expression that references a managed behavior of a UDT object that has been persisted in the database store (e.g., a behavior that returns the value of a field of the UDT object), the persisted object must be de-serialized (sometimes also referred to as “hydrating”) and the CLR must allocate memory for the full object in order to receive its stored values. The CLR must then invoke the actual method (i.e., behavior) of the UDT class that returns the value(s) that is the subject of the query. As described in the aforementioned co-pending application Ser. No. 10/692,225, the SqlUdtField( ) and SqlUdtProperty( ) annotations in the CLR class definition of a UDT can be used by the database server to also allow direct structural access to the values of certain UDT fields without the need for object hydration.
One new technology that takes advantage of the CLR integration and the provision of UDTs in SQL SERVER is the storage platform described in co-pending, commonly assigned, patent application Ser. No. 10/646,646, filed Aug. 21, 2003, entitled “Storage Platform For Organizing, Searching, And Sharing Data,” the disclosure of which is hereby incorporated by reference in its entirety. FIG. 4 is a block diagram illustrating the architecture of the storage platform 300 described in this co-pending application. The storage platform is sometimes referred to as “WinFS.” As shown in FIG. 4, the storage platform 300 comprises a data store 302 implemented on a database engine 314. In one embodiment, the database engine comprises a relational database engine, such as the Microsoft SQL SERVER relational database engine.
The data store 302 implements a data model 304 that supports the organization, searching, sharing, synchronization, and security of data in the form of Items and relationships between items, as described more fully below. Specific types of Items are described in schemas, such as schemas 340, and the storage platform 300 provides tools 346 for deploying those schemas as well as for extending those schemas, as described more fully below.
A change tracking mechanism 306 implemented within the data store 302 provides the ability to track changes to the data store. The data store 302 also provides security capabilities 308 and a promotion/demotion capability 310. The data store 302 also provides a set of application programming interfaces 312 to expose the capabilities of the data store 302 to other storage platform components and application programs (e.g., application programs 350a, 350b, and 350c) that utilize the storage platform.
The storage platform still further comprises an application programming interface (API) 322, which enables application programs, such as application programs 350a, 350b, and 350c, to access the capabilities of the storage platform and to access the data stored in the database. The storage platform API 322 may be used by application programs in combination with other APIs, such as the OLE DB API 324 and the Microsoft WINDOWS Win32 API 326.
The storage platform 300 may also provide a variety of services 328 to application programs, including a synchronization service 330 that facilitates the sharing of data among users or systems. For example, the synchronization service 330 may enable interoperability with other data stores 340 having the same format as data store 302, as well as access to data stores 342 having other formats. The storage platform 300 also provides file system capabilities that allow interoperability of the data store 302 with existing file systems, such as the WINDOWS NTFS files system 318.
In at least some embodiments, the storage platform 320 may also provide application programs with additional capabilities for enabling data to be acted upon and for enabling interaction with other systems. These capabilities may be embodied in the form of additional services 328, such as an Info Agent service 334 and a notification service 332, as well as in the form of other utilities 336.
In at least some embodiments, the storage platform is embodied in, or forms an integral part of, the hardware/software interface system of a computer system. For example, and without limitation, the storage platform of the present invention may be embodied in, or form an integral part of, an operating system, a virtual machine manager (VMM), a Common Language Runtime (CLR) or its functional equivalent, or a Java Virtual Machine (JVM) or its functional equivalent.
Through its common storage foundation, and schematized data, the storage platform enables more efficient application development for consumers, knowledge workers and enterprises. It offers a rich and extensible programming surface area that not only makes available the capabilities inherent in its data model, but also embraces and extends existing file system and database access methods.
In the following description, and in various ones of the figures, the storage platform 300 of the present invention may be referred to as “WinFS.” However, use of this name to refer to the storage platform is solely for convenience of description and is not intended to be limiting in any way.
The data model of the WinFS platform defines units of data storage in terms of Items, Item extensions, and Relationships. An “Item” is the fundamental unit of storage information. The data model provides a mechanism for declaring Items and Item extensions and for establishing relationships between Items. Items are the units that can be stored and retrieved using operations such as copy, delete, move, open, and so forth. Items are intended to represent real-world and readily-understandable units of data like Contacts, People, Services, Locations, Documents (of all various sorts), and so on. Item extensions are a way to extend the definition of an existing Item, and Relationships are a defined link between Items.
In WinFS, different Item types are defined for storing information. For example, Item types are defined for Contacts, People, Locations, Documents, etc. Each Item type is described by a schema that defines the properties and characteristics of a given Item. For example, a “Location” Item may be defined as having properties such as EAddresses, MetropolitanRegion, Neighborhood, and PostalAddresses. Once a schema is defined for a given Item type, deployment tools are used to translate the schema into a corresponding CLR class definition for that Item type, and then a UDT is created in the database store from the CLR class definition (in the manner described above) in order for instances of the WinFS Item type to be persisted in the database store. Using the WinFS API 322, applications (e.g., applications 350a, 350b, 350c, etc.) can create instances of the Item types supported by the data store in order to store and retrieve information from the storage platform data store. Each instance of an Item type stored in the data store has a unique identifier (e.g., Item_ID) associated with it; in one embodiment, each item identifier is a globally unique identifier, i.e. “guid.” Thus, the WinFS platform leverages the CLR integration and UDT capabilities of the database store to provide a platform for storing Items of information.
As with any instance of a UDT in SQL SERVER, instances of WinFS Items are ultimately stored in tables of the database store in the manner illustrated in FIG. 3. Applications can then submit queries to the WinFS platform to search for and retrieve Items from the data store that satisfy the search criteria. FIG. 5 illustrates how a query is executed against the data store to retrieve instances of an Item type called “Person.” In step (1), an application uses a “FindAll” method of the WinFS API 322 to initiate a query for all Items that satisfy a particular search criteria—in this case, all instances of the Person type in which the value in a “Birthday” field of the type is greater than a particular date (e.g., Dec. 31, 1999). At step (2), the WinFS API 322 translates the “FindALL” operation into a SQL query and submits it to the underlying database engine, e.g., SQL SERVER. In step (3), the database engine executes the query against the corresponding instances of the Person UDT and returns the stored values for each matching instance of the Person UDT. In this example, at step (4), ADO.Net turns the bits returned from the database store into CLR objects (i.e., the process of object hydration discussed above) and returns them to the WinFS API 322. ADO.Net is a component of the Microsoft .NET Framework that provides managed code access via the CLR to data sources such as SQL SERVER. The WinFS API then wraps the Person UDT objects and returns them to the application as Items of the Person type.
While the ability to create user defined types (UDTs) in a database store is a powerful capability, it would be desirable to enhance that capability to provide support for storing certain data types, such as large data types including, for example, digital images, video, audio, etc., as defined fields of a UDT. Furthermore, it would be desirable to provide “out of band” access to the large data fields of a UDT so that they could be accessed via traditional file system calls (open, close, etc.) without using the underlying query language of the database store. It would be particularly desirable for these capabilities to be provided in the context of the WinFS storage platform described above. Heretofore, these capabilities have not existed.
While the Microsoft SQL SERVER product provides the ability to designate an entire column of a relational database table as a type referred to as FILESTREAM, such that the data in the cells of that column is stored in a separate file from the relational database table, there is no ability to designate individual fields of a user defined type to be stored in that manner.
IBM's DB2 database product has a “datalinks” feature that supports the ability to link a column in a table to a file in a file system by storing a reference to the file. However, it provides only an N to 1 reference model between the cells of the column and the reference file, whereas a need exists for a 1-1 model between a cell and a file, as described above. The “datalinks” feature is also disadvantageous for the following additional reasons: (i) the programming model does not allow uniform storage and query of the separate file along with the regular relational data inside a user defined type, and (ii) the “datalinks” feature of DB2 does not allows updating of the column data stored in the referenced file via the file system.
Oracle Corporation's “IFS” product uses middle-tier software to provide access to data over a number of protocols such as SMB, HTTP, FTP, SMTP etc. The data is ultimately stored in a database. Although Oracle IFS provides an “integrated” view of different kinds of data, including large data types, the solution is implemented in middle-tier software, as opposed to in a relational database engine, and it does not satisfy the needs discussed above.
Finally, ISO/IEC 9075-9:2003 (aka SQL 2003 MED) proposes “datalinks” as a new data type. According to the proposed standard, a datalink is a value of the DATALINK data type. A datalink references some file that is not part of the SQL-environment. The file is assumed to be managed by some external file manager. A datalink is conceptually represented by a character string forming a reference to an external file; the reference is accessible by invoking operators defined in this section of ISO/IEC 9075. The character set of the reference, referred to as the datalink character set, is implementation-defined. This proposed standard does not address the desirable features discussed above.
Thus, a need still exists for a system and method that provides support for storing certain data types, such as large data types including, for example, digital images, video, audio, etc., as defined fields of a user defined type (UDT), as well as providing “out of band” access to those large data types via traditional file system calls (open, close, etc.). The present invention satisfies these needs.