1. Field of the Invention
This present invention generally relates to a computer system and more particularly to cache management in a computer system.
2. Description of the Prior Art
Data storage systems may be coupled to one or more host processors and provide storage services to each host processor. An example data storage system may include one or more data storage devices, such as those of the Symmetrix™ family, that are connected together and may be used to provide common data storage for one or more host processors in a computer system.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives, and disk interface units. Such storage devices are provided, for example, by EMC Corporation of Hopkinton, Mass. and disclosed in U.S. Pat. No. 5,206,939 to Yanai et al., U.S. Pat. No. 5,778,394 to Galtzur et al., U.S. Pat. No. 5,845,147 to Vishlitzky et al., and U.S. Pat. No. 5,857,208 to Ofek. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units. The logical disk units may or may not correspond to the actual disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data stored therein.
Performance of a storage system may be improved by using a cache. Cache memory may be used to store frequently accessed data for rapid access. Typically, it is time-consuming to read or compute data stored in the disk data storage devices. However, once data is stored in the cache memory, future use can be made by accessing the cached copy rather than reading it from the disk data storage device, so that average access time to data may be made lower.
One technique for expediting read requests involves prefetching data-units so that more data-units are available from cache memory rather than from disk storage. Typically, prefetching is implemented by reading data-units in blocks in response to one or more requests to read a data-unit. Since a request to read a specific data-unit increases the likelihood that access to other, related data-units will soon be required, the read request for the data-unit may trigger a prefetch request to read related data-units as well, particularly when a read request results in reading a data-unit off-cache rather than from the cache memory.
When, in the course of executing a read request, the requested data-unit is found in-cache, the operation constitutes a “Hit.” If the requested data-unit is not found in-cache, the operation constitutes a “Miss.”
Prefetching requires a significant number of cache-slots to be available in the cache memory. When long sequences of data-units are prefetched into the cache memory, other data-units typically have to be removed in the cache memory in order to make room for the newly prefetched data-units.
Prefetching also raises the possibility that data-units for which the host processor requires access may be replaced by data-units for which the host processor does not and never will require access. It is therefore, important to remove cache data that is not likely to be still required by the data storage system. Cache Pollution is defined to be the population of the cache memory with data-units that are not required for re-accessing.
Sequential prefetching, which involves reading blocks of adjacent data-units, assumes that data-units that are adjacent to a requested data-unit are also likely to be accessed. In fact, access requests often involve data-units that are sequential. Recognizing that the physical devices, such as disks, upon which data is stored off-cache, are organized and segmented into one or more logical volumes (LVs), the addresses of the adjacent data-units may not be physically sequential on the disk, but they will be sequential on the logical volume to which the data on the disk is mapped.
A feature known as “tail cutting” or simply “cutting” may be used to reduce cache pollution. Typically “tail cutting” uses two pointers to track the “oldest” and “newest” data. In tail cutting, a maximum number of data-units may be stored in the cache memory pursuant to a prefetch task. Once the maximum number has been prefetched into cache memory, certain data-units will be removed from the cache memory to make room for data-units prefetched pursuant to the prefetch task or pursuant to another prefetch task. Techniques used in connection with cache management including the use of “Tagged Based Cache” (TBC) and the use of timestamps are disclosed in U.S. Pat. No. 7,143,393 entitled Method For Cache Management For Positioning Cache Slot, Ezra, et al., which is hereby incorporated by reference. A data management system managing of data in a computer system by a data storage system is disclosed in U.S. patent application Ser. No. 11/726,744 entitled Methods And Systems For Incorporating Sequential Stream Read Requests Into Prefetch Management for Data Storage Having A Cache Memory, Orit Levin-Michael, et al, which is also hereby incorporated by reference.
Cache memory methods that allocate cache memory locations in blocks associated with a prefetch task have administrative overhead associated with releasing less the then the original allocated block size. Detailed knowledge of each prefetch task is also required for the efficient deallocate old data. This makes the architecture of a background task to release old prefetched data difficult. It would be advantageous to provide a prefetch implementation utilizing tail cutting in which the data storage system prefetches, tracks and stores data-units in the cache memory in such a way as to easily identify and release old prefetched memory locations.