1. Field of the Invention
This invention is generally related to the area of storage hierarchies and more particularly to the management of cache storage in a storage hierarchy.
2. Description of the Prior Art
The performance of data processing systems has improved dramatically through the years. While new technology has brought performance improvements to all functional areas of data processing systems, the advances in some areas have outpaced the advances in other areas. For example, advancements in the rate at which computer instructions can be executed have far exceeded improvements in the rate at which data can be retrieved from storage devices and supplied to the instruction processor. Thus, applications that are input/output intensive, such as transaction processing systems, have been constrained in their performance enhancements by data retrieval and storage performance.
The relationship between the throughput rate of a data processing system, input/output (I/O) intensity, and data storage technology is discussed in "Storage Hierarchies" by E. I. Cohen, et al., IBM Systems Journal, 28 No. 1 (1989)62-76. The concept of the storage hierarchy, as discussed in the article, is used here in the discussion of the prior art. In general terms, the storage hierarchy consists of data storage components within a data processing system, ranging from the cache of the central processing unit at the highest level of the hierarchy, to direct access storage devices at the lowest level of the hierarchy I/O operations are required for access to data stored at the lowest level of the storage hierarchy.
Caching takes place at various levels of the storage hierarchy. An instruction processor cache caches data stored in main memory and main memory essentially caches data stored in secondary storage. A second level cache between an instruction processor cache and the main memory is used in the 2200/900 Series data processing system from Unisys Corporation. Secondary storage devices, such as disk subsystems, are also available with a cache between the electromechanical storage device and the main memory of data processing system.
Present caching techniques are typically implemented according to the physical characteristics of the level of the storage hierarchy being cached and without regard to the logical relationship of the data being cached. As a result, the cache system may be unable to provide the expected performance benefit in certain scenarios.
Present cache systems, such as that described in U.S. Pat. No. 4,394,733 entitled, "Cache/Disk Subsystem", to Robert Swenson, are aware of the physical disk address of the data presently in cache, but are unaware as to which data in cache is logically related. For example, storage may be allocated to a file by the operating system in fixed units of storage called segments. The first segment of the file has a file relative segment offset of 0, the second segment of the file has a file relative segment offset of 1, and so on. Further consider that the physical segments of disk storage allocated to a file are not guaranteed to be contiguous. That is, it cannot be guaranteed that segment 0 of a file resides in the physical disk segment immediately preceding the physical disk segment allocated to segment 1 of the file.
The inability to recognize the logical relationship between physical disk segments in cache storage may adversely impact the performance benefits of the cache system in certain scenarios. For example, some applications cream very large files in the course of their processing. In particular, a merge-sort application combines the contents of two files and outputs a third sorted file. In the context of a cache disk system, the third file does not exist, so every write request results in a write-miss status from the cache disk. Because the sort-merge process is able to very quickly generate write requests, the available cache storage may be monopolized by the sort-merge application. To the extent that the sort-merge application is utilizing cache disk storage, other applications have less cache storage available for their use. For the purposes of this invention disclosure, this file behavior is referred to as "surging". As a result of one file surging, the other applications seeking access to the disk will only have limited cache storage available, thereby causing a substantial decrease in their overall throughput rate.
In the Cache/Disk System of U.S. Pat. No. 4,394,733, a counter is maintained for the number of segments which have been written-to. When this counter reaches a predetermined threshold, further write requests are rejected until the number of written-to segments falls below the selected threshold.
The shortcoming of this is that a single application may effectively monopolize cache storage if it generates write requests too quickly. Other applications must wait for written-to segments from the single application to be destaged before they are allowed to write new segments to cache storage. Thus, all applications are being adversely impacted because of the rate at which a single application is writing dam to a file.
It would be desirable to identify when a file is surging, limit further writing to the single file until segments can be destaged, and eliminate the adverse impact on the performance of other applications when a single file is surging.