1. Field of Invention
The invention relates generally to computer databases, and more specifically to the quick retrieval of data records based upon contiguous consolidation according to access frequency.
2. Background of Invention
Large amounts of data are typically stored in computer databases. Various database architectures are currently known in the art. These architectures have generally been developed during a time when computer hardware storage (both disk storage and random access memory) were relatively limited and expensive. However, these technologies have recently become significantly more powerful and less expensive. As a result, the hardware environment has changed considerably from when current database technologies were developed. Only a short time ago, the standard hard disk size was two gigabytes, and database server computers did not support more than two gigabytes of random access memory. Currently, the standard server computer is equipped with up to eighteen gigabytes of disk space, and over two gigabytes of random access memory. As a consequence of these hardware developments, much larger databases are not only now possible, but are expected by users and customers of database systems.
With the increase in size, new demands are placed on databases. Typically, a larger database results in significantly slower access time. Known database technologies were developed when less static and dynamic storage were readily available, and thus are not designed to take advantage of contemporary hardware configurations. For example, known database technologies typically rely on a highly partitioned storage strategy, that permits data to be distributed across multiple disks. Generally, each partition of the database is less than one gigabyte in size. Thus, in a large database, data records will be stored in multiple partitions, on multiple physical areas on a disk, and on multiple disks. Consequently, access to data records becomes slower as the size of the data (and thus the physical distribution) increases.
To access a data record, first the record must be located, then the read-head must be moved to the physical location of the record in order to read it. A similar set of operations is required in order to write an updated record to disk. The mechanical operation of physically moving a disk-head is much slower than executing an electronic operation, such as loading data from a disk into random access memory. Databases typically include the requirement that data records must be able to be accessed in a random rather than sequential manner (in other words, a user needs to be able to access any record, in any order). The data records of traditional databases are distributed across various physical locations. Therefore, randomly accessing a plurality of data records involves significant disk-head movement that is essentially random, resulting in lengthy average seek times. The larger the database, the greater the record distribution, and hence the slower the data access will be. In practice, when known database architectures are used to stored large databases, the system spends a significant amount of time moving the disk-heads to various physical locations on multiple disks, resulting in slow access time. Therefore, what is needed is a method and a system for facilitating quick access of data records in large databases.
According to an embodiment of the present invention, the most recently accessed data records are stored contiguously on static media, and the least recently accessed data records are stored contiguously on static media. Additionally, a buffer in random access memory is used to store a subset of the data records. preferably those that have been most recently accessed. When a data record is accessed, it is stored in the buffer. From time to time, the most recently accessed records from the buffer are flushed to static media, ensuring their contiguous storage. The least recently accessed records, which are stored on static media, are consolidated, such that they too are contiguously stored.
Access of data records is thus much faster than in a traditional database. The most recently accessed records are, statistically, the most likely records to be subsequently accessed. Because the most recently accessed records are in the buffer, they are accessed extremely quickly. Because these records are stored contiguously on static media, whenever one of these records is accessed from there, it is also likely that the read-head will already be proximate to the location of the record. In a traditional database system, an accessed record is not likely to be physically located proximately to a previously accessed record, and thus more disk-head movement is required. Because disk-head movement is slow, the substantial minimization thereof results in greatly increased performance.
Additionally, both a write process and a consolidator process store data records on static media contiguously in blocks. Therefore, there is very little movement of the disk write-head compared to that required by a traditional database system to store the same number of records across multiple physical disk locations.
The features and advantages described in this summary and the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.