In a computing system, the rate at which data is accessed from rotating media (e.g., hard disk drive, optical disk drive) (hereinafter “disk”) is generally slower than the rate at which a processor processes the same data. Thus, despite a processor's capability to process data at higher rates, the disk's performance often slows down the overall system performance, since the processor can only process data as fast as the data can be retrieved from the disk.
A cache system may be implemented to at least partially reduce the disk performance bottleneck by storing selected data in a high-speed memory location designated as the disk cache. Then, whenever data is requested, the system will look for the requested data in the cache before accessing the disk. This implementation improves system performance since data can be retrieved from the cache much faster than from the disk.
Certain access patterns, however, may decrease the efficiency of the cache system. For example, applications that repeatedly flush or overwrite the contents of the cache without using any of the cached data may render the cache system useless. When such access patterns arise, it may be better to circumvent the cache and access the disk directly.
Streams may be used to detect regular access patterns where it may be better to access the disk directly instead of first looking in the disk cache. A stream is a sequential, time-ordered set of read or write requests. Each stream is associated with a request size. A stream's request size is the amount of data to be read or written by the first request in the stream, though this request size may change over the life of the stream.
Currently, a stream's request size is used to determine whether a stream is suitable for direct disk access. For example, a stream having a small request size may not be suitable for direct disk access because small requests tend to involve data that is accessed frequently and is desirable to be cached. On the other hand, a stream having a large request size may be suitable for direct disk access because large requests tend to involve data that is not accessed frequently and is not desirable to be cached.
Despite of the above, some streams (e.g., streams generated by applications that access the entire disk or a large portion of the disk, such as backup, virus scan or desktop search software) have small request sizes but are not good candidates for caching, because when data accessed by said streams is cached, the cached data is flushed before it can be used, rendering the cache system useless.
Such result is obviously undesirable. Therefore, systems and methods are needed that can overcome the above shortcomings.
Features, elements, and aspects of the invention that are referenced by the same numerals in different figures represent the same equivalent, or similar features, elements, or aspects, in accordance with one or more embodiments.