A storage system is a computer that provides storage service relating to the organization of information on storage devices, such as disks. The storage system includes a storage operating system that logically organizes the information as a set of data blocks stored on the disks. In a block-based deployment, such as a conventional storage area network (SAN), the data blocks may be directly addressed in the storage system. However, in a file-based deployment, such as a network attached storage (NAS) environment, the operating system implements a file system to logically organize the data blocks as a hierarchical structure of addressable files and directories on the disks. In this context, a directory may be implemented as a specially formatted file that stores information about other files and directories.
The storage system may be configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the storage system. The storage system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet links, that allow clients to remotely access the shared information (e.g., files) on the storage system. The clients typically communicate with the storage system by exchanging discrete frames or packets of data formatted according to predefined network communication protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP). In this context, a protocol consists of a set of rules defining how the interconnected computer systems interact with one another.
In a file-based deployment, clients employ a semantic level of access to files and file systems stored on the storage system. For instance, a client may request to retrieve (“read”) or store (“write”) information in a particular file stored on the storage system. Clients typically request the services of the file-based storage system by issuing file-system protocol messages (in the form of packets) formatted according to conventional file-based access protocols, such as the Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols. The client requests identify one or more files to be accessed without regard to specific locations, e.g., data blocks, in which the requested data are stored on disk. The storage system converts the received client requests from file-system semantics to corresponding ranges of data blocks on the storage disks. In the case of a client “read” request, data blocks containing the client's requested data are retrieved and the requested data is then returned to the client.
In a block-based deployment, client requests can directly address specific data blocks in the storage system. Some block-based storage systems organize their data blocks in the form of databases, while other block-based systems may store their blocks internally in a file-oriented structure. Where the data is organized as files, a client requesting information maintains its own file mappings and manages file semantics, while its requests (and corresponding responses) to the storage system address the requested information in terms of block addresses on disk. In this manner, the storage bus in the block-based storage system may be viewed as being extended to the remote client systems. This “extended bus” is typically embodied as Fibre Channel (FC) or Ethernet media adapted to operate with block-based access protocols, such as the Small Computer Systems Interface (SCSI) protocol encapsulated over FC (FCP) or encapsulated over TCP/IP/Ethernet (iSCSI).
Each storage device in the block-based system is typically assigned a unique logical unit number (lun) by which it can be addressed, e.g., by remote clients. Thus, an “initiator” client system may request a data transfer for a particular range of data blocks stored on a “target” lun. Illustratively, the client request may specify a starting data block in the target storage device and a number of successive blocks in which data may be stored or retrieved in accordance with the client request. For instance, in the case of a client “read” request, the requested range of data blocks is retrieved and then returned to the requesting client.
In general, a file system does not directly access “on-disk” data blocks, e.g., assigned respective disk block numbers (dbn) in a dbn address space. Instead, there is typically a one-to-one mapping between data blocks stored on disk, e.g., in a dbn address space, and the same data blocks organized by the file system, e.g., in a volume block number (vbn) space. For instance, N on-disk data blocks may be managed within the file system by assigning each data block to a unique vbn between zero and N−1. Furthermore, the file system may associate a set of data blocks (i.e., vbns) with a file or directory managed by the file system. In this case, the file system may attribute each data block in the file or directory with a corresponding “file offset” or file block number (fbn). Illustratively, the file offsets in the file or directory may be measured in units of fixed-sized data blocks, e.g., 4 kilobyte (kB) blocks, and therefore can be mapped one-to-one to fbn numbers in that file or directory. Accordingly, each file or directory is defined within the file system as a sequence of data blocks assigned to consecutively numbered fbns, e.g., where the first data block in each file or directory is assigned to a predetermined starting fbn number, such as zero. Here, it is noted that the file system assigns sequences of fbn numbers on a per-file basis, whereas the file system assigns vbn numbers over a typically larger volume address space.
A read stream is defined as a set of one or more client requests that instructs the storage system to retrieve data from a logically contiguous range of file offsets within a requested file. In other words, after the read stream's first request is received, every subsequent client request in the read stream logically “extends” a contiguous sequence of file offsets in the file accessed by the stream's previous request. Accordingly, a read stream may be construed by the file system as a sequence of client requests that directs the storage system to retrieve a sequence of data blocks assigned to consecutively numbered fbns. For instance, the first request in the read stream may retrieve a first set of data blocks assigned to the fbns 10 through 19, the stream's second request may retrieve data blocks whose fbns equal 20 through 25, the third request may retrieve the data blocks assigned to the fbns 26 through 42, and so on. It is noted that client requests in the read stream may employ file-based or block-based semantics, so long as they instruct the storage system to retrieve data from the stream's logically contiguous range of file offsets.
Operationally, the storage system typically identifies a read stream based on an ordered sequence of client accesses to the same file. As used hereinafter, a file is broadly understood as any set of data in which zero or more read streams can be established. Accordingly, the file may be a traditional file or directory stored on a file-based storage system. Conventionally, the storage system can only monitor one file read stream at a time. To that end, the storage system determines whether a client's currently requested file data requires the storage system to retrieve a set of data blocks that logically extends a read stream already established in the file. If so, the client request may be associated with the read stream, and the read stream may be extended by the number of retrieved data blocks.
Upon identifying a read stream, the storage system may employ speculative readahead operations to retrieve data blocks that are likely to be requested by future client read requests. These “readahead” blocks are typically retrieved from disk and stored in memory (i.e., buffer cache) in the storage system, where each readahead data block is associated with a different file-system vbn. Conventional readahead algorithms are often configured to “prefetch” a predetermined number of data blocks that logically extend the read stream. For instance, for a read stream whose client read requests retrieve a sequence of data blocks assigned to consecutively numbered fbns, the file system may invoke readahead operations to retrieve additional data blocks assigned to fbns that further extend the sequence, even though the readahead blocks have not yet been requested by client requests in the read stream.
Typically, the readahead operations are “triggered” whenever a file's read stream reaches one of a predefined set of file offsets or memory addresses. For example, suppose the predefined set of file offsets consist of every 32nd file offset in the file (i.e., file block numbers 0, 32, 64, etc.). Further suppose that an existing read stream begins at fbn number 4 and extends to fbn number 27. If a client read request is received that instructs the storage system to retrieve fbn numbers 28 through 34, the request extends the read stream past the predefined fbn number 32, thereby triggering readahead operations. Accordingly, the conventional readahead algorithm retrieves a predetermined number of data blocks, e.g., 288 data blocks, beginning with fbn number 35, from disk for storage in cache in anticipation of future read requests in that read stream.
One disadvantage of current storage systems is their inability to identify a read stream whose ordered sequence of read requests has been “interrupted” by other client requests. For instance, if the ordered sequence is interrupted, e.g., by one or more random read requests or by requests in other read streams, then the storage system can not distinguish the read stream's requests from the non-read stream requests. As a result, the storage system can not perform readahead operations for the unidentified read stream. For example, suppose a client issues “overlapping” read requests in different read streams. To a conventional storage system, the interleaved client requests appear to be random, non-ordered requests rather than belonging to separate read streams. In such a case, the storage system can not perform readahead operations for either of the interleaved read streams. Similarly, the storage system also may not perform readahead operations for read streams whose requests are interleaved with random client write requests.
Another disadvantage of conventional storage systems is their inability to identify a read stream whose read requests are received “nearly sequentially.” Disordering of one or more of the read stream's requests may occur for various reasons. For instance, the client may issue the read-stream requests non-sequentially. Alternatively, the storage system may receive the client requests sequentially, although inherent latencies in retrieving the client-requested data causes the storage system to process the read-stream requests non-sequentially. In general, the storage system is not configured to identify nearly-sequential read requests as belonging to the same read stream, and thus readahead operations are not performed for the unidentified read stream.
It is therefore desirable for a storage system to identify an ordered sequence of read requests as belonging to the same read stream, even when the requests are interleaved with non-read stream requests or arranged nearly sequentially. Further, the storage system should be able to concurrently manage readahead operations for multiple read streams without negatively affecting the system's performance.