1. Technical Field
This application generally relates to data storage, and more particularly to techniques used in connection with data prefetching operations in a data storage system.
2. Description of Related Art
Computer systems may include different resources used by one or more host processors. Resources and host processors in a computer system may be interconnected by one or more communication connections. These resources may include, for example, data storage devices such as those included in the data storage systems manufactured by EMC Corporation. These data storage systems may be coupled to one or more host processors and provide storage services to each host processor. Multiple data storage systems from one or more different vendors may be connected and may provide common data storage for one or more host processors in a computer system.
A host processor may perform a variety of data processing tasks and operations using the data storage system. For example, a host processor may perform basic system I/O operations in connection with data requests, such as data read and write operations.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives, and disk interface units. Such storage devices are provided, for example, by EMC Corporation of Hopkinton, Mass. and disclosed in U.S. Pat. No. 5,206,939 to Yanai et al., U.S. Pat. No. 5,778,394 to Galtzur et al., U.S. Pat. No. 5,845,147 to Vishlitzky et al., and U.S. Pat. No. 5,857,208 to Ofek. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels to the storage device and storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units, logical devices, or logical volumes (LVs). The logical disk units may or may not correspond to the actual disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data stored therein.
When a requester requests data from a disk, there may be considerable latency incurred in the process of retrieving the data from the disk. A cache memory, which is a relatively fast memory that is separate from the disks, may be used to address some of the latency issues associated with disks. The cache memory may contain recently fetched or requested data. Upon receiving a request for data, the data storage system first checks to see if the requested data is already in the cache memory. If so, the data storage system retrieves the data directly from the cache without having to access the disk and can avoid latencies and other delays associated with reading from a physical disk drive. Retrieving data which is already in cache allows the data to be returned to the requester in less time than if the data is not already in cache and is accessed on the disk.
In some cases, the data storage system may determine that the desired data is not in the cache memory but, instead, is on a disk. In this instance, the data storage system instructs a disk controller to retrieve the desired data from an appropriate track on a disk, store the retrieved data in cache, and return the retrieved data to the host. The foregoing is undesirable because such an operation is afflicted with latencies associated with mechanical motion within the disk drive and possible latencies associated with data transmission between the cache memory and the disk drive.
Different techniques may be used to populate the cache with data that the host or other requester is expected to request. A data storage system may perform data prefetching operations which prefetch data from a device and store the prefetched data in cache. Data prefetching relates to obtaining data from a device and storing the retrieved data in cache prior to receiving an actual request for the data, such as a request from a host. If data is not in cache when requested, the data may then be retrieved from the disk, stored in cache and returned to the host. Data prefetching techniques try to identify or recognize a pattern of I/O requests in a stream in order to try and predict what data will be requested next and prefetch data based on such prediction. One pattern is a sequential I/O stream. Data prefetching techniques may observe received I/O requests to try and identify a sequential I/O stream. A sequential I/O stream may be characterized as a sequence of I/O requests accessing data sequentially from the requester's point of view. A sequential I/O stream involves operating on one data portion, such as a track, immediately after the preceding one or more tracks of data in the stream. By identifying a usage pattern which is a sequential stream in connection with issued I/O requests, data prefetching techniques try and predict what data will be requested next and, accordingly, prefetch the data. For example, a data prefetching technique may observe a number of recently received I/O requests to try and identify a sequential I/O stream. If such a sequence is identified, the data prefetching technique may then obtain the next one or more data portions which are expected in the sequence prior to the data portions actually being requested.
Existing data prefetching implementations may have a problem recognizing sequential I/O streams due to the complexity of data storage configuration with multiple layers of logical device mappings in the data storage system, RAID striping, and the like. Not all of the information needed to recognize a sequential I/O stream may be available to the component in the data storage system performing the recognition and associated prefetch processing. In a data storage system such as by EMC Corporation, a backend disk adapter (DA) or director as included in a disk controller may read and write data to the physical devices. The DA may implement the data prefetching technique and perform processing to recognize a sequential I/O stream. The DA may only have access to information regarding the LV to physical device mappings and may otherwise not have access to information regarding other logical mappings and logical entities, as defined on the data storage system, which may be referenced in a host I/O request. As such, the DA may not be able to properly recognize a sequential I/O stream from the requester's (e.g., host's) point of view in order to trigger any appropriate prefetch processing. As an example, a data storage system may define a metavolume which consists of multiple LVs. The metavolume appears to the host as a single logical device that may be used in connection with the host's I/O requests. A host may issue I/O requests to consecutive tracks of data on the metavolume in which the consecutive tracks span two LVs. The foregoing may be a sequential I/O stream when evaluated across the two LVs in the context of the metavolume. However, the DA may not have knowledge regarding the metavolume and, thus, not recognize the foregoing sequential stream to trigger any appropriate prefetch processing.
Existing data prefetching techniques may also include inefficiencies. For example, in one existing implementation, the DA may maintain a list of information with an entry in the list for each I/O task the DA is servicing. In connection with determining whether to prefetch additional data subsequent to an initial prefetch, the DA may continuously evaluate each entry on the list to determine whether to perform additional prefetching for the associated task. Such polling of the list may be time consuming and reduce the amount of time and data storage system resources available to perform data prefetching.
As such, it may be desirable to utilize a data prefetching technique which provides for improved sequential stream recognition and is efficient in connection with maintaining information and using resources in connection with prefetch processing. It may also be desirable that such techniques be flexible and adjustable for use in connection with different sequential stream characteristics.