A modem computer system typically comprises a central processing unit (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communications busses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU is the heart of the system. It executes the instructions which comprise a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
A computer system is a very complex machine having numerous components which interact with each other. While the CPU is the driving engine, the overall speed, or throughput, of the system can be affected by various other components, which either cause the CPU to wait or impose additional workload on the CPU. E.g., where the CPU needs data from memory, it may have to wait several cycles to access memory. Where the CPU needs data which is not in memory but is stored on a storage device, such as a hard disk drive, it executes operating system functions to access the data in storage, and the operating system often switches execution to another task or thread while waiting for the data from storage. These operations, although they do not necessarily cause the CPU to be idle, impose additional workload on the CPU which can affect system performance. They also introduce delay which may cause a human interacting with a computer to wait longer for a response.
Many large computer systems are used primarily or substantially to support database applications. A database application is a program which organizes, accesses and maintains a large pool of data. Typically, the database application services requests from multiple users to access small pieces of information in the database, either for purposes of reading them or updating them. These accesses do not necessarily occur in any sequential order, and may appear to be scattered to random portions of the database. Because the database is usually very large, it is generally impractical to keep the entire database, or even a major portion of it, in main memory at all times. Therefore a database application is usually characterized by a large number of storage access operations, most of which individually are small, and which are scattered among the storage addresses of the system. Under these conditions, the performance of the computer system is heavily dependent on the collective performance of the storage device(s).
Faster storage hardware will in many cases improve the performance of computer systems used for servicing large databases, but for a given set of storage hardware characteristics, it is further possible to improve performance by either reducing the number of storage access operations, or by performing some operations when the storage hardware is less busy, or by more efficiently utilizing the available storage hardware.
One well-known technique for supporting database changes is journaling. Journaling involves writing the change operations sequentially to a special storage device or devices, or special portion of a storage device. Journaling doesn't reduce the number of storage operations performed, but operates on the principle that the storage hardware is more efficiently used. Specifically, the typical storage device is the rotating magnetic disk drive. For small, random accesses to a disk drive, most of the time required to access data will be devoted to seeking to a track (a seek) and waiting for the disk to rotate to the desired angular position (latency). If, however, data is always written sequentially (to the next sector or track), then these seek and latency times are virtually eliminated, and the same amount of data can be written in a much smaller time interval. Unfortunately, sequential writing means that the data in the journal is not organized according to the organizational structure of the database. Therefore, journaling only amounts to saving the data temporarily on a non-volatile storage device. Ultimately, the same data updates must be performed on the organized data (original copy) in storage, which generally means many small write operations. Changed data is typically kept in memory until an update to the original copy in nonvolatile storage ensues. This update is performed from data in memory, since journalled data is organized in a different fashion. Keeping journal data in memory longer may reduce the total number of write accesses to the original copy of the journal data on disk, because a memory page housing consecutive journal data may be updated multiple times before the ultimate write to storage is performed. In other cases, this delayed buffering may allow the storage write operation to be executed at a time when the storage device has become less busy, or may allow multiple write operations to be combined, or other forms of efficiency improvement to the storage write operation.
One of the design goals of many large modem computer systems is data preservation or redundancy, i.e., data should not be lost as a result of a system malfunction (whether due to an external cause such as loss of power or an internal cause such as a component failure) Another design goal is availability, i.e., that the system be available to users as much as possible. In some cases, there is a need for constant availability, i.e., the system must be designed so that it is always available, come what may. In other systems, some amount of down time, or some amount of time when the system operates at reduced performance, may be acceptable.
In general, there is some trade-off between data preservation and availability on the one hand and maximum utilization of hardware resources for productive work on the other. Journaling is one example of this maxim. The journal enhances data preservation and availability by saving data in a non-volatile location pending a write of changed data to structured non-volatile storage, but journaling itself requires hardware resources in the form of storage devices and supporting hardware, and may consume portions of the available bandwidth of other resources, such as buses, channels and processors. As another example of this maxim, it is well known to store data in a redundant fashion on multiple storage devices in any of various schemes known as “RAID”, meaning “redundant array of independent disks”, but all of these schemes sacrifice some of the storage capacity of the disks in order to achieve redundancy, and in some cases may adversely affect storage access times when compared with non-redundant storage schemes.
There are further design trade-offs in the way in which a journal is implemented. If every database change entry is written immediately to the journal disk, the journal is burdened with a large number of small write operations. Typically, some journal change entries are buffered or cached in memory, so that multiple entries are written at a time to the journal. The greater the number of entries which are cached before writing to the journal, the fewer the number of writes and consequent impact on system performance. However, the longer one waits before writing the journal entries to nonvolatile storage, the more one reduces the journal's beneficial effects, i.e., more data is exposed.
A need exists, not necessarily recognized, for a means for managing these competing considerations so that a system achieves reasonable levels of performance, availability and data preservation.