A data storage system is typically able to service “data write” or “data read” requests issued by a host computer. A host may be connected to the storage system's external controller or interfaces (IF), through various channels, either directly or via a data network, that transfer both data and control information (i.e. control signals). Physical non-volatile media in which data may be permanently or semi-permanently stored includes arrays of disk devices, magnetic or optical, which are relatively less expensive than semiconductor based volatile memory (e.g. Random Access Memory) but are relatively much slower in being accessed. As the prices of fabrication and production of ever-larger arrays of non-volatile memory cells continue dropping, these arrays may also be used and considered to be mass data storage media or device.
A cache memory is a high-speed buffer typically located between an IF and its associated mass storage device(s), which is meant to reduce the overall latency of Input/Output activity between the storage system and a host accessing data on the storage system. Whenever a host requests data stored in a memory system, the request may be served with significantly lower latency if the requested data is already found in cache, since this data does not have to be located and retrieved from the relatively slower mass data storage device(s). For example, as of the year 2004, speeds of IO transactions involving disk activity are typically on the order of 5-10 milliseconds, whereas IO speeds involving cache (e.g. RAM memory) access are on the order of several nanoseconds.
The relatively high latency associated with disk activity derives from the mechanical nature of the disk devices. In order to retrieve requested data from a disk based device, a disk controller must first cause a disk reading arm to physically move to a track containing the requested data. Once the head of the arm has been placed at the beginning of a track containing the data, the time required to read the accessed data on the relevant track is relatively very short, on the order of several microseconds.
One criterion or parameter which is often used to measure the efficiency of a cache memory system or implementation is a criterion referred to as a hit ratio. A hit ratio of a specific implementation is the percentage of “data read” requests issued by the host that are already found in cache and that consequently did not require time intensive retrieval from disk operations. An ideal cache system would be one reaching a 100% hit ratio. One way known in the art to improve performance by means of enhancing the hit ratio, includes implementing intelligent algorithms that attempt to guess in advance which portions of data stored on a disk device will soon be requested by a host. Once it has been estimated/guessed that some specific data will soon be requested, in anticipation of the request, the algorithm(s) “pre-fetch” the data into the cache, prior to actually receiving a request for the data.
In some memory storage and retrieval systems, one or more cache memories are connected to or otherwise associated with a plurality of mass data storage devices (e.g. disk devices). The cache circuits and the associated prefetch circuits in such systems may not be aware that data to be cached may be stored on different mass data storage devices, and the criteria which dictate which blocks of data are to be perfected into the cache may be indifferent to the fact that the data is divided between a plurality of mass data storage devices. In addition, such systems, and in particular the cache circuits and the associated prefetch circuits of such systems, may be incapable of identifying prefetch triggers and servicing prefetch requests referring to data stored on more than one disk drive or to create a prefetch cluster of data blocks or data units comprised of data blocks (or units) retrieved from two or more different mass data storage devices.
There is a need for a method, circuit and system for retrieving some or all of the data blocks associated with a logically related set of data, such as a file or group of related files, from two or more different mass storage devices into one or more caches. Any logically related set of data, that is data (e.g. data blocks, data bytes, etc) which has some kind of functional autonomy within the system, including but not limited to the fact that they can be read together as a whole.