The invention relates generally to database systems and systems for filing data, and particularly to linkage between data stored in a database system and files stored in a file system that is external to the database system.
Generally, a file system is used to "file away" information which a user will later retrieve for processing. With reference to H. M. Deitel, OPERATING SYSTEMS (Second Edition, 1990), Chapter 13, a file system provides a user with the ability to create a file that is "a named collection of data". Normally, a file resides in directly accessible storage and may be manipulated as a unit by file system operations. As Deitel teaches, a file system affords a user the means for accessing data stored in files, the means for managing files, the means for managing direct access storage space where files are kept, and the means for guaranteeing the integrity of files. As is known, there is a class of applications where large data objects such as digitized movies, digitized images, digitized video, and computer-generated graphics are typically captured, processed, and stored in file systems.
With reference to the IEEE Mass Storage Systems Reference Model Version May 4, 1990, developed by the IEEE Technical Committee on Mass Storage Systems and Technology), a Mass Storage System is used to store and administer data objects known as "bitfiles". A bitfile is an uninterpreted sequence of bits, of arbitrary length, possessing attributes relating to unique identification, ownership, and other properties of the data present in the bitfile, such as its length, time of creation, and a description of its nature. A Mass Storage System is able to administer a hierarchy of storage devices for the storage of bitfiles to provide cost effective storage.
When used herein, a system for filing data (also, "a filing system") encompasses file systems and mass storage systems as defined above. The term "file" is hereafter used to denote data stored in a filing system.
C. J. Date, in AN INTRODUCTION TO DATABASE SYSTEMS (Sixth Edition, 1995), Chapter 1, defines a database system as "basically a computerized record-keeping system . . . ". The contents of a database system (records) are defined, organized, and accessed according to some scheme such as the well-known relational model.
A file management component of a file system normally operates at a level above an operating system; access to the contents of the file system requires knowledge at least of the identity of a file. A database system, on the other hand, operates at a level above a file management system. Indeed, as Date points out, a database management system (DBMS) component of a database system typically operates on top of a file management system ("file manager").
According to Date, while the user of a file system may enjoy the ability to create, retrieve, update, and destroy files, it is not aware of the internal structure of the file and, therefore, cannot provide access to them in response to requests that presume knowledge of such structure. In this regard, if the file system stores movies, the system would be able to locate and retrieve a file in which a digitized version of "The Battleship Potemkin" is stored, but would not be able to respond to a request to return the titles of all Russian-language movies directed by Sergei Eisenstein, which is well within the ability of a database system to do.
It may, therefore, be asked whether a database system might not be used to index and provide access to large objects in a file system (such as files that contain digitized versions of Russian-language movies). In fact, a database can provide such a capability. However, in order to provide access to files containing the large objects, the DBMS must possess the facilities to store indexed information of which the objects are composed. Manifestly, such functions would waste the resources of a general purpose database system set up to store, access, and retrieve relatively short objects such as records. Moreover, the raw content of a large object captured in a file system may be so vast as to be impractical to structure for a database request. Typically, features of such an object (such as a digitized image) would be extracted from the file, formatted according to the database system structure, and then used by the database system to support the search of stored objects based on the extracted features. See, for example, the query by image content (QBIC) system and method disclosed in U.S. patent application Ser. No. 07/973,474, filed Nov. 9, 1992 now abandoned, and U.S. patent application Ser. No. 08/216,986, filed Mar. 23, 1994 now U.S. Pat. No. 5,579,471, both of which are incorporated herein by reference.
Such system joinders, moreover, do not provide referential integrity for data stored by the database system. Relatedly, "referential integrity" refers to the guarantee that the database system will not contain any unmatched foreign key values. This guarantee is based upon the consistency of the contents and structure of a database system. Referential integrity guarantees, for example, that if a reference to a file titled "The Battleship Potemkin" is included in a database system response to a request to list all Russian-language movies directed by Sergei Eisenstein, the movie itself (or its digitized form) will exist in the file system and will be named identically in the database and file systems.
Accordingly, there is a need to link the power of a database system to search data records with the capacity of a file management system to store large data objects, while providing referential integrity to the linkage between the database system and the file management system.