1. Technical Field
This application relates to application aware cache management.
2. Description of Related Art
Computer systems may include different resources used by one or more host processors. Resources and host processors in a computer system may be interconnected by one or more communication connections. These resources may include, for example, data storage devices such as those included in the data storage systems manufactured by EMC Corporation. These data storage systems may be coupled to one or more servers or host processors and provide storage services to each host processor. Multiple data storage systems from one or more different vendors may be connected and may provide common data storage for one or more host processors in a computer system.
A host processor may perform a variety of data processing tasks and operations using the data storage system. For example, a host processor may perform basic system I/O operations in connection with data requests, such as data read and write operations.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives (referred to as “disks” or “drives”), and disk interface units. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels to the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units. The logical disk units may or may not correspond to the actual disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data in the device. In order to facilitate sharing of the data on the device, additional software on the data storage systems may also be used.
Typically, a memory in a modern digital data processing system such as a data storage system consists of a hierarchy of storage elements, extending from large-capacity but relatively slow storage elements and various levels of lower-capacity and relatively fast storage devices. The large-capacity and relatively slow devices include such types of devices as disk or tape storage devices which store information on a magnetic medium; such devices are relatively inexpensive on a storage cost per unit of storage basis. Intermediate in the hierarchy, both in terms of speed and storage capacity are random-access memories, which are somewhat faster than the disk or tape devices, but which are also more expensive on a storage cost per unit of storage basis. At the fastest end of the hierarchy are cache memories, which are also the most expensive and thus generally the smallest.
Generally, during processing operations, a processor will enable information to be processed to be copied from the slower devices to the increasingly faster devices for faster retrieval. Generally, transfers between, for example, disk devices and random-access memories are in relatively large blocks, and transfers between the random-access memories and cache memories are in somewhat smaller “cache lines.” In both cases, information is copied to the random-access memory and cache memory on an “as needed” basis, that is, when the processor determines that it needs particular information in its processing, it will enable blocks or cache lines which contain information to be copied to the respective next faster information storage level in the memory hierarchy. Certain prediction methodologies have been developed to attempt to predict the whether a processor will need information for processing before it (that is, the processor) actually needs the information, and to enable the information to be copied to the respective next faster information storage level. However, generally at some point in the processing operations, the processor will determine that information required for processing is not available in the faster information storage level, that is, a “read miss” will occur, and it (that is, the processor) will need to delay its processing operations until the information is available. Generally, the rate at which read misses will occur with storage element(s) at a particular level in the hierarchy will be related to the storage capacity of the storage element(s) at the particular level, as well as the pattern with which the processor accesses the information in the respective storage level. In any case, to enhance the processing efficiency of a digital data processing system, it is generally helpful to be able to assess the effect of changing the capacity of the memory element(s) at a particular level in the memory hierarchy on the rate of read misses at the particular level. (Similarly, with respect to a write cache, if a write cache is full of data that has not yet been destaged to slower storage elements, a write pends until a portion of the cache is flushed. In another case, write-through may be used when the cache is full. In either case, a similar delay to that experienced for the read miss is endured.)
Caching controllers that interface with host computers or the like for directing data exchanges with data storage systems such as large arrays of magnetic data storing disks, or other storage media, have been developed for providing a storage medium for large quantities of digital information. These controllers respond to read and write commands from a remote computer system to receive, and/or deliver data over interconnecting busses. They often employ expensive solid state storage, such as RAM, to cache host data to minimize the relatively long latency of the disk subsystem.
The caching controller functions so that it minimizes delays and demands on the host system, while including the ability to recover wherever possible from errors from single points of failure. System configurations and operations capable of dynamically overcoming single points of failure are sometimes referred to as fault tolerant systems. Such redundant fault tolerant systems and operations in a disk array controller environment are described in commonly-assigned U.S. patent application Ser. No. 08/561,337, filed Nov. 21, 1995 entitled “Improved Fault Tolerant Controller System and Method” by W. A. Brant, M. E. Nielson an G. Howard; Ser. No. 08/363,132 entitled “A Fault Tolerant Memory System” by G. Neben, W. A. Brant and M. E. Nielson; and Ser. No. 08/363,655 entitled “Method and Apparatus for Fault Tolerant Fast Writes Through Buffer Dumping” by W. A. Brant, G. Neben, M. E. Nielson and D. C. Stallmo (a continuation-in-part application of U.S. Ser. No. 08/112,791 by Brant and Stallmo which is itself a continuation-in-part of application Ser. No. 638,167 filed Jan. 6, 1991 by Brant, Stallmo, Walker and Lui the latter of which is now U.S. Pat. No. 5,274,799).
The cache controller avoids wait time by the host computer, or central processor, in reading or writing relative to a disk by buffering write data into a protected fast memory, and servicing most read data from fast memory. A system, as described in the above-referenced patent applications, can include redundant storage media array controllers for responding to host computer requests for transferring data between that host computer and an arrangement for low cost but large quantity data storage.
In “Computer Architecture, A Quantitative Approach” by D. A. Patterson and J. L. Hennessey (Morgan Kaufmann Publishers, Inc., Second Edition, 1990, 1996), discusses processor memory, or RAM, and how it is cached. It describes the disciplines, such as direct map, set associative, and the like.
Different tasks may be performed in connection with a data storage system. For example, a customer may perform data storage configuration and provisioning tasks. Such tasks may include, for example, configuring and provisioning storage for use with an email application. Tasks may include allocating cache and storage, specifying the logical and/or physical devices used for the storage allocation, specifying whether the data should be replicated, the particular RAID (Redundant Array of Independent or Inexpensive Disks) level, and the like. With such options in connection with performing configuration and provisioning tasks, a customer may not have the appropriate level of sophistication and knowledge needed.
Thus, it may be desirable to utilize a flexible technique which assists customers in connection with performing data storage management tasks such as related to data storage configuration and provisioning. It may be desirable that the technique be adaptable to the particular knowledge level of the user to provide for varying degrees of automation of data storage configuration and provisioning in accordance with best practices that may vary with the underlying data storage system and application.