Recently there has been a tremendous increase in the use of large unstructured data types such as pictures, videos and movies, with an accompanying need to store these large streams of data in an efficient manner. Traditionally, data has been stored in a construct such as a file system or in a database.
A file system is a hierarchical structure of data in which files are stored within folders on a storage medium such as a hard disk. A part of the operating system maintains the file system and controls access to the files in the file system. File systems are good at streaming large quantities of unstructured data in and out of files. One of the problems with the file system as it is presently known is that files must be manually organized into groups (folders and sub-folders) and if the user forgets where he has stored a particular file, it may be difficult to find the file again. This problem is exacerbated by technology drivers such as advances in disk technology which have seen the development of larger and larger hard disks. The sheer volume of data capable of storage on a single disk can make keeping track of files in a file system an extremely difficult task.
The other widely used method of data organization is a database. A database system stores data as one or more tables in which each row of the table comprises groups of related data elements concerning an entity and columns that represent useful pieces of information about the entity that is the subject of the row. For example, a database of human resources information may be maintained in which each row of the human resources database represents an employee and each column in the human resources database represents data elements such as employee name, employee social security number and employee pay rate.
Databases provide a number of useful advantages over the file system organization of data. Database management systems are good at storing, finding and retrieving small pieces of structured data. Typically, in addition, there are highly flexible means of searching for and accessing specified portions of the data stored in the database. However, databases have not handled the storage and access of large pieces of unstructured data, referred to as BLOBs (binary large objects), particularly well.
Specifically, when a database includes a BLOB column, typically the BLOB is torn apart into small pieces that are scattered across the disk. The entry in the database column contains a pointer to the first of the pieces of the BLOB rather than containing the BLOB itself. This situation leads to inefficiencies in retrieving the data in the BLOB because the different pieces of the BLOB must be found and reassembled. Typically, to reduce the impact of these inefficiencies, a pointer to the first piece of the BLOB would be returned instead of instantly retrieving the BLOB itself.
For example, suppose a database of employee information includes a BLOB column for an employee photograph. Suppose a user requests a particular employee photograph, and is returned a pointer to the photograph. The pointer represents a physical location—e.g., a 16-byte hexadecimal value representing the actual disk address of the sector of the disk where the first piece of the photograph is stored. Several problems can arise in this situation. In addition to the disk address being unintelligible to the user, if the operating system reorganizes the data on the disk, the photograph may no longer exist at that location, in which case a “not found” message will be returned to the user.
Recently, another way to store BLOBs has been developed where a BLOB is stored as a contiguous file or “FILESTREAM” in the file system. A FILESTREAM data storage attribute is provided that can be used to tag a column in a relational table. The FILESTREAM attribute indicates that the data for that column will be stored as a file in the file system of the operating system (OS). The database management system manages the creation and deletion of the file in the file system. There is a 1:1 reference between the file in the file system and a cell (intersection of row and column). The data in the FILESTREAM column can be manipulated the same way as the data in other columns using a programming language, such as SQL or MICROSOFT®'s T-SQL.
Thus, a FILESTREAM column is used in a database for large unstructured data. The use of a FILESTREAM data storage attribute enables large unstructured data to be stored as a contiguous file in the file system while remaining accessible to a database. Such a database management system needs to maintain consistency of the links (i.e., “link-level consistency”) between a database row having a FILESTREAM attribute and its corresponding FILESTREAM data to ensure data integrity and to avoid corruption of the database. For example, if a fault such as a power failure or system crash occurs prior to the time at which a change is committed to disk (or “flushed”), several problems may result. For example, the database may not reflect the existence of a file or directory that exists in the file system; or, alternatively, the database may reflect the existence of a file or directory that does not exist in the file system. Thus, a broken link between the database and the file compromises the integrity of the database because a database user cannot be assured that the database accurately reflects the current state of the data represented by the FILESTREAM cell in the database column.
Typically, maintenance of link-level consistency has been achieved by way of two different approaches: consistency check and repair, and logging and recovery. In consistency check and repair, a crawling task searches a database and file system to check for inconsistencies and potentially repairs them. Such an approach is time-consuming, not well-targeted, and consumes excessive system resources.
Conventional logging and recovery methods may use logging in the database log or may coordinate with a Transacted File System. In the former method, file system operations are logged in the database log along with database data updates. In such an approach, when the database management system recovers the database, redo and undo operations may be triggered for logged file system operations as well to make file system data consistent with database data in the same database recovery framework. A drawback of such an approach is that the database management system is usually not tightly integrated with the file system, and therefore has no knowledge of the file system's data flushing to the disk. Without the ability to coordinate data flushing, the database management system has to force the log to flush the log record for every file system operation to achieve proper Write-Ahead-Logging, which helps to maintain transactional link-level consistency in a crash recovery. This log forcing results in one disk I/O operation per file system operation, which is typically an unacceptable amount of performance overhead.
The latter method involves coordination with a Transacted File System, where the file system itself is transactional and capable of recovering. The database management system participates in a distributed transaction that is coordinated by a superior Transaction Manager (“TM”). During crash recovery, the superior TM resolves in-doubt transactions and ensures consistency between the database and file system resource managers. A shortcoming of this approach is that Transacted File Systems are not available on commercial operating systems. Therefore, for many database management systems on many OS platforms, this approach is unavailable. In addition, this method has the shortcoming of the added complexity and performance costs in implementation.
Accordingly, there is a need for a mechanism that maintains link-level consistency between database columns and their corresponding FILESTREAM data in the file system while addressing the above shortcomings. The present invention satisfies this need.