1. Field of the Invention
The present invention relates to the field of database storage and access generally, and in particular, to the field of storage and access of very large relational or object-oriented databases.
2. Description of the Related Art
Generally, a database is a collection of data, such as a telephone book or the card catalog at a library. Databases are very powerful when used with computers. A computer database organizes data so that various computer programs can access and update the information which typically resides on a computer-readable medium in one central store or file. The data in a computer database is stored as records having a number of fields.
One specialized type of computer database is called a relational database. The relational database's organizational scheme treats the database as multiple tables of data in which rows represent variable length records and the columns represent fields. Relational databases give the user the flexibility to link information stored in many tables. They allow users to interchange and cross-reference information between two different types of records, such as comparing the information in a group of invoices to the information in an inventory. One popular relational database accessing standard is the relational database language SQL (structured query language). SQL allows users to present near-English queries to the database manager to view or retrieve specific information from the database in a variety of different ways.
In many applications, it is desirable to keep data available in an on-line database that is readily accessible to users for long periods of time. This goal can present problems, however. A relational database containing, for example, records of all telephone calls for a particular area may grow by as much as 110 gigabytes per month. It is expensive and impractical to store a relational database this large on a single, fast, storage device.
FIG. 1 is a graph 100 showing frequency of database access versus the length of time records have been present in the database. The frequency of access for a record is highest in the first time slot 102. Thus, during time slot 102, which may be one to three months, records should be available to database users with little delay in access time. As the age of the data progresses through time slots 104 (1 to 3 months), 106 (3 to 4 months), 108 (5 to 6 months), and 110 (greater than 6 months), the frequency with which the records are accessed decreases as explained above. The less frequently one needs to access a record, the longer wait one can tolerate.
FIG. 2 is a pyramid graph 202 showing the relationship between storage media cost, performance, and capacity. At the top of the pyramid is RAID (Redundant Array of Inexpensive Disks) storage 203, followed succeedingly by magnetic disk storage 204, optical storage 205, tape storage 206, and finally, off-line shelf storage 207. FIG. 2 illustrates that the performance of a storage medium (measured by, for example, access time and read time) increases with the cost of the medium. Also, as the cost and performance increase, the capacity of the medium tends to decrease.
One solution to storing large amounts of data that takes advantage of its properties as explained in FIGS. 1 and 2 is to periodically move the older, less frequently-accessed data to slower, less expensive storage devices. For example, a file may migrate from a magnetic disk, to an optical disk library, to an automated tape library, and finally to shelf storage magnetic tape. This solution permits large amounts of data to be stored on a reasonably priced set of storage devices. The disadvantage of the slow rate of reading data from magnetic tape is minimal because accesses to the magnetic tape occur relatively infrequently.
Storing large amounts of data across more than several storage devices creates its own problem, however. There is, presently, no way to split a relational database across multiple storage devices while maintaining the ability to query the complete relational database as a single entity. To date, organizations typically retain only the most current information in an on-line, searchable database and archive the older data. This archiving process is not consistent with ongoing needs for timely on-line access to corporate information.