1. Field of the Invention
The present invention relates generally to a method and apparatus for storing, retrieving, and distributing various kinds of data. More specifically, the present invention relates to a physical storage architecture for a shared file environment and method for using such.
2. Art Background
Over the last 30 years, computers have become increasingly important in storing and managing information. This has lead, in turn, to the widespread sharing and communication of data such as electronic mail and documents over computer networks. To support the sharing of data, client-server architectures have become increasingly commonplace which allow users to access files on a server. In particular, it has become common to enable many users to access the same database that resides in a server or servers.
Most current database architectures are designed for continuous access to a single set of data files. The single set of files can be shared directly or indirectly as in a client-server network. This approach encounters difficulties when users at many physical sites need to access the same data simultaneously at different client computers.
There are three common approaches to the problem of simultaneous access. According to the first approach, all the users must access a single site, typically a computer mainframe. According to the second approach, each site has an exact copy of the data at the other sites, all of which copies are kept synchronized in real-time using algorithms such as two-phase commit. The third method dictates that each site has a copy of the data at the other sites, that the copies are not always the same, and a synchronization of the copies must occur at some regular interval. This is known as a synchronous replication.
Current database architectures are designed for continuous access to all data files, and hence work well with the mainframe and two-phase commit approach. In situations when continuous access is not guaranteed, however, the systems operating according to these approaches do not function properly.
Client-server systems designed for desktop information management and local area networks uniformly use one of the first two approaches described above. These approaches tend to provide an imbalanced load on the server and typically require locking of the shared files on the remote server which further hampers performance. In addition, the files resident on the server typically require a connection to the client and thus updates may not occur without such a connection. The first two approaches also tend to be relatively slow for updates as updates must be synchronized in real-time.
The present invention overcomes the limitations of the prior art by providing a flexible, efficient and fast physical storage system that combines the advantages of a synchronous replication with the need for direct access to central data. It is designed to be used as a file system that allows users to share files on networks and across different storage media such as hard-drives, CD-ROMS and WORM drives.
Current physical storage systems suffer from limitations in addition to the synchronization problems previously discussed. A physical storage system must store data items, such as a database record, in a non-volatile memory until such time as an application requires access to such data. This process typically involves `flattening` the contents of data items and writing them to the storage medium. The storage medium is generally divided into fixed size blocks, each of which has a location.
According to prior art storage systems, there are two restrictions that can ease the design of such a system. The first restriction is that each data item be a fixed length. The second restriction is that only the most recent version of each data item need be stored. Prior art storage systems generally operate according to one or both of these restrictions. In a typical storage system, a block of memory is found that is large enough to hold a data item, which is then written to that block. When an item is deleted, the other items in the block are reorganized to free up the maximum amount of space, ready for another data item. A new block is created only when no existing block has enough space for a new data item.
The prior art approach has numerous disadvantages. Prior art systems do not readily support variable length data and previous versions of a data item are not available, so that no `undo` function is available to the user. Further, the prior art methods may not be used in conjunction with append-only media such as write-once read-many (WORM) disks.
As will be described, the present invention overcomes the limitations of prior art storage systems by providing a system that easily supports variable length data items without erasing older versions of data items while occupying a relative minimum of disk space.
Many database products have been developed to allow users to store and manipulate information and to search for desired information. The continuing growth of the information industry creates a demand for more powerful databases.
The database products have evolved over time. Initially, databases comprised a simple "flat file" with an associated index. Application programs, as opposed to the database program itself, managed the relationships between these files and a user typically performed queries entirely at the application program level. The introduction of relational database systems shifted many tasks from applications programs to database programs. The currently existing database management systems comprise two main types, those that follow the relational model and those that follow the object oriented model.
The relational model sets out a number of rules and guidelines for organizing data items, such as data normalization. A relational database management system (RDBMS) is a system that adheres to these rules. RDBMS databases require that each data item be uniquely classified as a particular instance of a `relation`. Each set of relations is stored in a distinct `table`. Each row in the table represents a particular data item, and each column represents an attribute that is shared over all data items in that table.
The pure relational model places number of restrictions on data items. For example, each data item cannot have attributes other than those columns described for the table. Further, an item cannot point directly to another item. Instead, `primary keys` (unique identifiers) must be used to reference other items. Typically, these restrictions cause RDBMS databases to include a large number of tables that require a relatively large amount of time to search. Further, the number of tables occupies a large amount of computer memory.
The object oriented database model, derived from the object-oriented programming model, is an alternative to the relational model. Like the relational model, each data item must be classified uniquely as belonging to a single class, which defines its attributes. Key features of the object-oriented model are: 1) each item has a unique system-generated object identification number that can be used for exact retrieval; 2) different types of data items can be stored together; and 3) predefined functions or behavior can be created and stored with a data item.
Apart from the limitations previously described, both the relational and object oriented models share important limitations with regard to data structures and searching. Both models require data to be input according to a defined field structure and thus do not completely support full text data entry. Although some databases allow records to include a text field, such text fields are not easily searched. The structural requirements of current databases require a programmer to predefine a structure and subsequent date entry must conform to that structure. This is inefficient where it is difficult to determine the structure of the data that will be entered into a database.
Conversely, word and image processors that allow unstructured data entry do not provide efficient data retrieval mechanisms and a separate text retrieval or data management tool is required to retrieve data. Thus, the current information management systems do not provide the capability of integrating full text or graphics data entry with the searching mechanisms of a database.
The separation of database from other programs such as word processors has created a large amount of text and other files that cannot be integrated with current databases. Various database, spreadsheet, image, word processing, electronic mail and other types of files may not currently be accessed in a single database that contains all of this information. Various programs provide integration between spreadsheet, word processing and database programs but, as previously described, current databases do not support effective searching in unstructured files.
The present invention overcomes the limitations of both the relational database model and object oriented database model by providing a database with increased flexibility, faster search times and smaller memory requirements and that supports text attributes. Further, the database of the present invention does not require a programmer to preconfigure a structure to which a user must adapt data entry. Many algorithms and techniques are required by applications that deal with these kinds of information. The present invention provides for the integration, into a single database engine, of support for these techniques, and shifts the programming from the application to the database, as will be described below. The present invention also provides for the integration, into a single database, of preexisting source files developed under various types of application programs such as other databases, spreadsheets and word processing programs. In addition, the present invention allows users to control all of the data that are relevant to them without sacrificing the security needs of a centralized data repository.