The present invention relates to a method and apparatus for determining information that is to be prefetched in a multi-stream environment
Modern storage adapters and controllers typically have some cache memory to capture temporal locality, which is the property that if a page is referenced recently, it is likely to be referenced again in the near future. However, certain data streams also exhibit spatial locality. That is, when a page is referenced, the next few pages are likely to be referenced soon. Sequential prefetch is a well-known mechanism to capture spatial locality, which has been done in single-stream environments such as file system and database software systems. A single-stream environment is an environment in which logical information is available to identify potential sequential streams of reference. For example, in a file system, the references to a file are likely to be sequential in nature.
Problems arise when implementing prefetch in a multi-stream storage environment. In a multi-stream environment, each stream is independently referencing storage locations, resulting in a reference stream presented to the storage device that is an aggregate of the individual streams. The aggregate stream will likely not possess significant spatial locality, even though each individual stream may. A further problem arises in that the prefetch scheme cannot use too much memory, because some storage adapters have limited memory.
A need arises for a technique which can detect sequential streams from among the aggregate reference stream and yet requires relatively little memory to operate.
The present invention is a system and method for determining information that is to be prefetched in a multi-stream environment which can detect sequential streams from among the aggregate reference stream and yet requires relatively little memory to operate. It is uniquely adapted for use in a multi-stream environment, in which multiple data accessing streams are performing sequential accesses to information independently of each other. The present invention detects patterns of sequential accesses from among the jumble of accesses that the aggregate access stream presents to the storage system.
In accordance with the method of the present invention, a reference address referencing stored information is received. A matching run is found. A count corresponding to the run is updated. If the count exceeds a predetermined threshold, an amount of information to prefetch is determined. If a predetermined fraction of the determined amount of information to prefetch must still be retrieved, the determined amount of information is retrieved. A matching run may be found by searching a stack comprising a plurality of entries to find an entry corresponding to the reference address. Each of the plurality of entries may be associated with a maximum accessed address, a forward range, and a backward range, and the searching step may comprise searching the plurality of stack entries in one direction starting at an end of the stack and determining whether the reference address is between (maximum accessed addressxe2x88x92backward range) and (maximum accessed address+forward range) for each stack entry until a matching stack entry is found.
The method may further comprise rearranging the plurality of stack entries according to a replacement policy. The replacement policy may be a first-in, first-out replacement policy. Alternatively, other replacement schemes may be used. The plurality of stack entries may further be rearranged so as to make the referenced information eligible for immediate replacement.
An amount of information to prefetch may be determined based on the count corresponding to the run and on a size of the prefetch buffer. The count may be updated for each reference address matching the run or for each unique reference address matching the run.