1. Field of the Invention
This invention relates generally to database management systems and systems for the storage of data objects, and particularly to efficiently managing access and control over data that is linked to a database system and stored remotely in a file system or other object repository.
2. Description of the Related Art
Data is typically maintained for storage and retrieval in computer file systems, wherein a file comprises a named collection of data. A file management system provides a means for accessing the data files, for managing such files and the storage space in which they are kept, and for ensuring data integrity so that files are kept intact and separate. Applications (software programs) access the data files through a file system interface, also referred to as the application program interface (API). Unfortunately, management of computer data using file management systems can be difficult, because such systems do not typically provide sufficient information on the characteristics of the files (information called metadata). File management systems do not incorporate general purpose query engines for easily searching through the file contents for data of interest.
With reference to the IEEE Mass Storage Systems Reference Model Version 3, May 1990, (developed by the IEEE Technical Committee on Mass Storage Systems and Technology), a Mass Storage System is used to store and administer data objects know as "bitfiles". A bitfile is an arbitrarily long uninterpreted sequence of bits, possessing attributes relating to unique identification, ownership and other properties of the data present in the bitfile, such as its length, time of creation and a description of its nature. A Mass Storage System is able to administer a hierarchy of storage devices for the storage of bitfiles to provide cost effective storage. When used herein the phrases "a system for storing files", "a file system", or "an object repository" encompass the definition of a Mass Storage System as described above.
A database management system (DBMS) is a type of computerized record-keeping system that stores data according to a predetermined schema, such as the well-known relational database model that stores information as a collection of tables having interrelated columns and rows. A relational database management system (RDBMS) provides a user interface to store and retrieve the data, and provides a query methodology that permits table operations to be performed on the data. One such RDBMS is the Structured Query Language (SQL) interface. In general, a DBMS performs well at managing data in terms of data record (table) definition, organization, and access control. A DBMS performs well at data management because a DBMS associates its data records with metadata that includes information about the storage location of a record, the configuration of the data in the record, and the contents of the record.
As part of its data management function, a DBMS performs many automatic backup and copying operations on its tables and records to ensure data integrity and recoverability. Currently, DBMSs are poorly suited to the management of large data objects. To operate on large objects stored in a database., the DBMS needs to export the object to a file system or other data management system. The object is then modified and reimported to the database. For large objects the backup, logging and copying tasks the DBMS automatically performs when manipulating data become prohibitively expensive. This makes the DBMS system overhead too large and the performance insufficient to justify its use for managing most large data objects. Thus, a DBMS often is not useful for managing large amounts of digitized data, including the so-called binary large object (BLOB) data type. A BLOB is a DBMS data type that typically is used for storing arbitrarily large data objects, such as multimedia data files that contain digitized audio and video information. The limitation of the BLOB data type make this database column type unsuitable for applications such as computer aided design and manufacturing (CAD/CAM) and digital libraries. An alternative approach, described in this invention, is to keep large data objects stored as files in a file system and link these references to these external files from the database. This alleviates most of the copying tasks and, with file system support, all of the database consistency and integrity guarantees can be met.
A file management system or file system is used to store data on computer systems. In general, file systems store data in a hierarchical name space. Files are accessed, located and referenced by their unique name in this hierarchical name space. File systems are efficient for storing and accessing data for applications that are aware of the unique name of a file and use the file management system interface.
While file systems are efficient at accessing data, data stored in file systems do not typically have the consistency and integrity guarantees of data stored in databases. Furthermore, by storing object data in a database, additional information about the object may be associated with it in the database record. This metadata about the object permits database applications to make generalized inquiries about the metadata of the object. Such general inquiries may be made through a DBMS interface language, such as SQL. The database system typically incorporates metadata for each file in the underlying file management system and uses the metadata to more easily retrieve a particular file, keep track of file versions, and retrieve files based on contents.
With respect to the BLOB data type, BLOBs are typically accessed through a database system, using an interface such as SQL. Because BLOBs are usually extremely large files and often contain digitized data. it can be relatively difficult to keep track of different BLOB versions. It can be especially difficult to maintain a log of BLOB versions through successive changes. Many DBMSs choose not to log changes to BLOBs and past versions of BLOBs are generally not recoverable. This differs from all other database types which are logged when change and therefore recoverable to previous values. Certain aspects of managing data files are handled well by database management systems, and other aspects are better handled by file management systems.