A storage system is a computer that provides storage services relating to the organization of information on storage devices, such as disks. A storage system typically accesses one or more storage volumes. A storage volume comprises physical storage devices defining an overall logical arrangement of storage space, and each volume is usually associated with its own file system. A storage system typically includes a storage operating system that logically organizes the information as a set of data blocks stored on disks. In a file-based deployment, such as a network attached storage (NAS) environment, a storage system may be a file server, the operating system of which implements a file system to logically organize the data blocks as a hierarchical structure of addressable files and directories on the disks. A known type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may also opt to maintain a near optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks.
A storage system may be configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the storage system. The storage system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet links, that allow clients to remotely access the shared information (e.g., files) on the storage system. The clients typically communicate with the storage system by exchanging discrete frames or packets of data formatted according to predefined network communication protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP). In this context, a protocol consists of a set of rules defining how the interconnected computer systems interact with one another.
In a file-based deployment, clients employ a semantic level of access to files and file systems stored on the storage system. For instance, a client may request to retrieve (“read”) or store (“write”) information in a particular file stored on the storage system. The client requests identify one or more files to be accessed without regard to specific locations, e.g., data blocks, in which the requested data are stored on disk. The storage system converts the received client requests from file-system semantics to corresponding ranges of data blocks on the storage disks. In the case of a client “read” request, data blocks containing the client's requested data are retrieved and the requested data is then returned to the client.
A read stream is defined as a predictable sequence of read operations. In other words, after the read stream's first request is received, every subsequent client request in the read stream logically “extends” a contiguous sequence of file offsets in the file accessed by the stream's previous request. Accordingly, a read stream may be construed by the file system as a sequence of client requests that directs the storage system to retrieve a sequence of data blocks assigned to consecutively numbered file block numbers (fbns). For instance, the first request in the read stream may retrieve a first set of data blocks assigned to the fbns 10 through 19, the stream's second request may retrieve data blocks whose fbns equal 20 through 25, the third request may retrieve the data blocks assigned to the fbns 26 through 42, and so on. It is noted that client requests in the read stream may employ file-based or block-based semantics, so long as they instruct the storage system to retrieve data from the stream's logically contiguous range of file offsets. A long sequential read may be divided into multiple sequential read operations. A read stream composed of sequential reads separated by unread regions, e.g. read for fbns 10 through 20, read for fbns 30 through 40, and read for fbns 50 through 60 may be referred to as spanning reads.
Operationally, the storage system typically identifies a read stream based on an ordered sequence of client accesses to the same file. Upon identifying a read stream, the storage system may employ speculative readahead operations to retrieve data blocks that are likely to be requested by future client read requests. These “readahead” blocks are typically retrieved from disk and stored in memory (i.e., buffer cache) in the storage system, where each readahead data block is associated with a different file-system volume block number (vbn). Conventional readahead algorithms are often configured to “prefetch” a predetermined number of data blocks that logically extend the read stream. For instance, for a read stream whose client read requests retrieve a sequence of data blocks assigned to consecutively numbered fbns, the file system may invoke readahead operations to retrieve additional data blocks assigned to fbns that further extend the sequence, even though the readahead blocks have not yet been requested by client requests in the read stream.
A file system may utilize a component responsible for “prefetching” data blocks from mass storage devices that are local to the storage system. Such component may be termed a readahead engine. A storage system, such as a file server, may implement a file system with a readahead engine configured to optimize the amount of readahead data retrieved from a local device for each read stream managed by the file system. The readahead engine could rely on various factors to adaptively select an optimized readahead size for each read stream. Such factors may include the number of read requests processed in the read stream, an amount of client-requested data requested in the read stream, a read-access style associated with the read stream, and so forth. The readahead engine could also be configured to minimize cache pollution (i.e., loading data into the cache that will not be reused before it is evicted) by adaptively selecting when readahead operations are performed for each read stream and determining how long each read stream's retrieved data is retained in memory.
Such an optimized readahead module has been utilized to process requests that require access to locally stored data, but not for remote requests. An existing technique arbitrarily extends the length of remote reads in hopes that the client read access pattern comprises long sequential reads. This approach, however, sacrifices performance on random or spanning reads and fails to tune the read length to the clients' access pattern.
A system where one file server is used as a caching server and another file server is used as an origin server may be referred to as a multi-node caching system.
In a multi-node caching system, the system that is in direct contact with the client (e.g., a file sever acting as a caching intermediary between a client and an origin file server) may have the best information as to the client's intent, as the front-end (client-facing) system will have observed all of the client's past transactions. On the other hand, the back-end (remote/origin) system may have gaps in its knowledge of client access patterns due to the effects of caching at the front-end—e.g., client requests for data that can be satisfied by the file system that is local to the caching server, (cache hits), are not observed by the back-end system. In a similar context, existing readahead engines do not implement a method to pass information about the client access patterns to the back-end system, and thus are not capable of optimizing inputs/outputs (I/Os) to its data drives. It is desirable to utilize an optimized readahead module to process read requests without regard to whether the read request requires local or remote access.