1. Field of the Invention
This invention relates to a method for managing data in a data storage hierarchy and a hierarchy employing such a method. More particularly, the method relates to destaging operations in a data storage hierarchy.
2. Description of the Related Art
It is desirable to store computer data in such a manner that it be immediately available when required. Fast access to data can be achieved by using a very large high speed data storage device. However, the price of data storage increases as both the speed and capacity of the data storage device increases. Therefore, high speed memories are typically of a relatively small capacity, which is often exceeded by the amount of data required to be stored for a given application. When a given application requires data storage exceeding that of high speed memory, it becomes impractical to rely on a single low speed, high capacity data storage device because access time to the data becomes prohibitively large.
The access time to data may be improved by employing a data storage hierarchy in place of a single data storage device. A data storage hierarchy consists of multiple levels of data storage devices. The highest level, or first level, is typically the smallest, fastest, and most expensive form of data storage. The size of data storage increases and the speed and cost of data storage typically decreases as the level of storage in the hierarchy also decreases. Examples of data storage devices employed in hierarchies include: semiconductor device main and cache memory, magnetic tape drives, magnetic drums, magnetic disk drives, and optical disk drives. These devices may be used in a variety of combinations and levels to create a data storage hierarchy. In addition, a level of the data storage hierarchy may be comprised of a magnetic tape, magnetic disk, or optical disk library. A library, or mass storage system, includes one or more data storage devices, a plurality of storage cells, and a mechanism for automatically transferring data storage media between the storage cells and the storage devices. For example, an optical disk library could include one or more optical disk drives, a plurality of storage cells for storing optical disks, and mechanized means for transferring the disks between the storage cells and the optical disk drives. The existence of libraries is well known, as evidenced by an article to Dorrell and Mecklenburg. (Mass Storage Device, IBM Technical Disclosure Bulletin, Vol. 15, No. 9, Feb. 1973, pp. 2943-45.)
Typically, a system including a data storage hierarchy is programmed such that all data contained therein are initially stored in the highest level of the hierarchy. Over time, according to rules programmed into the hierarchy, data are transferred between different levels of the hierarchy to meet the system storage and access needs. When the host processor requires particular data, the location of the data in the hierarchy is first determined. If the data required is stored in the highest level of the hierarchy, the data is retrieved and used. If the data is not stored in the highest level of the hierarchy, it can be retrieved for use directly from its present location, if possible, or first transferred to the highest level of the hierarchy and then retrieved from that level. The movement of data from a relatively low level of the hierarchy to a relatively high level of the hierarchy is known as "staging". The data is staged so as to permit the system rapid access to the data as required in the future. Since data that has recently been used is often likely to be used again shortly thereafter, the presence of the data in the highest level of the hierarchy increases the overall speed of the system. The ability to directly access the data at a lower level of the hierarchy depends on the system connections and type of data storage devices at each level. The data that is accessed from a lower level of the hierarchy is typically determined to be relatively unlikely to be accessed frequently.
A common problem in data storage hierarchies is the relative size of each level of the hierarchy. The high cost of high speed memory requires that the size of the highest or higher levels of the hierarchy be limited. As a result, although data storage hierarchies nevertheless improve the speed of data access over single data storage devices, the capacity of the highest levels of the hierarchy can be exceeded. Use of the highest levels of the data storage hierarchy can be optimized by prioritizing the storage of data at each level. For example, the system may be designed such that data is rated according to its likelihood of use and the level at which it is generally stored is determined thereby. In addition, data may be transferred from relatively higher levels of the hierarchy to relatively lower levels of the hierarchy, as it becomes accessed less frequently over time. The movement of data from a relatively higher level of the hierarchy to a relatively lower level of the hierarchy is known as "destaging". As with staging, destaging may be prioritized according to the frequency of use of particular data. Data not likely to be accessed frequently can be destaged to a relatively low level of the hierarchy for archival purposes.
The destaging of data may be used for several purposes. As previously mentioned, data may be destaged as it ages and becomes less likely to be accessed. In addition, there is always the risk that the capacity of the relatively higher levels of the hierarchy can be exceeded, despite the aforementioned prioritization of the storage of data at each level of the hierarchy. When the system requires a staging or destaging operation to be performed such that data is to be transferred to a level of the hierarchy for which the storage capacity has been exceeded, data in that level must first be destaged to create storage availability for the data desired to be staged or destaged. Thus, system optimization requires management techniques for both the staging and destaging of data.
Techniques for efficiently destaging data in a data storage hierarchy are known. The simplest destaging technique includes random choice of the data to be destaged, as disclosed in U.S. Pat. No. 3,588,839. It is also known to choose data for destaging on a first-in first-out (FIFO) basis. See, for example, Boland, L.J., Buffer Store Replacement Control, IBM Technical Disclosure Bulletin Vol. 11, No. 12, May 1969, pp. 1738-39, Kinard, et al., Data Move Optimization in Mass Storage Systems, IBM Technical Disclosure Bulletin Vol. 21, No. 6, Nov. 1978, pp. 2246-49, May, C M., Management Technique for Memory Hierarchies, IBM Technical Disclosure Bulletin Vol. 24, No. 1A, June 1981, pp. 333-335. It is also known to stage data in a manner such that the number of destaging operations is minimized. This may be accomplished by staging data in large units, as opposed to merely the exact data currently required to be staged on the theory that data stored physically or logically nearby data currently requiring access is more likely to be accessed in the future than data stored elsewhere in the hierarchy. By staging a larger unit of data than actually required, the need to stage again in the future is eliminated. Since only a single staging operation is required, what otherwise would have been two separate destaging operations are efficiently combined into a single destaging operation. A sample unit used in such a staging and destaging technique would be a complete track of a magnetic storage disk.
The least recently used (LRU) technique is another known for example, U.S. Pat. Nos. 4,020,466 and 4,077,059 disclose a system in which data to be destaged is determined by the time at which the data stored was last accessed. Only data which has been accessed since it was stored at its current level of the hierarchy can be destaged, such destaging performed beginning with the data least recently accessed. Similar systems are shown in U.S. Pat. Nos. 4,530,054 and 4,463,424. Modifications of the least recently used destaging technique are known. For example, U.S. Pat. No. 4,636,946 discloses first determining the least recently used data for destaging, and then destaging along with that data other data having certain characteristics in common therewith. The common characteristics may be, for example, storage in the same physical or logical location of the level of the hierarchy. By destaging multiple records at one time, staging and destaging operations are minimized.
Several problems are associated with the least recently used destaging technique. First, the technique is complex in that both the time of entry of the data in the current level of the hierarchy and the time of access of the data must be available. Also, where the size of the data records to be destaged is typically quite larger than the amount of data which can be interpreted by the host processor at any given time, the host processor may be tied up during a long series of destaging operations. If the hierarchy includes a write-once recording media for archival purposes, destaging will not eliminate the data from the upper levels of the hierarchy. Such archival purposes include the storage of data not likely to be frequently accessed, such as business records. Finally, in hierarchies including a library, the likelihood of future access to data may not correlate particularly well to the time of recent accesses or even to the time of entry of the data into the library.