Different networks can have different characteristics such as the bandwidth available, the latency of a link of that network, the reliability of the links in the network, etc. Difficulties often arise when protocols and operations that are designed for high-bandwidth, low-latency, robust networks are attempted over low-bandwidth, high-latency and/or unreliable networks, but sometimes those protocols and operations need to be used. For example, computer users in an organization might need to access, from their local desktop computers, files, data, etc. that is obtained through a network. Many of the protocols that are designed for local area networks assume that the network has high bandwidth and low latency, yet those protocols often are used over wide area networks, which might have lower bandwidth and higher latency. A local area network (LAN) is but one example of a high-bandwidth, low-latency network, while a wide area network (WAN) is but one example of a low-bandwidth, high-latency network. Of course, there may be LANs and WANs that are such that the WAN performs better than the LAN, but these examples are used herein nonetheless.
Typically, devices that connect to a network operate in a client-server mode, wherein one device, the client device, initiates a transaction with a server, while a server waits for requests from client devices and responds to those requests. Thus, a transaction can be a client making a request of a server and the server responding to that request.
Where the network path between a client and a server includes LAN portions and WAN portions, a transaction accelerator pair might be used to accelerate the WAN portion. Examples of such transaction accelerators are described in McCanne I and McCanne III. Transaction accelerators have also been recently called by other names, such as WAN accelerators, WAN optimizers, WAN optimization controllers (WOCs), wide-area data services (WDS) appliances, WAN traffic optimizers (WTOs) and so forth. In recent times, transaction acceleration has also been called transaction pipelining, protocol pipelining, request prediction, application flow acceleration, protocol acceleration, and so forth. Herein, the terms transaction acceleration and protocol acceleration are used interchangeably, unless otherwise indicated.
Parallel Access to Network File Systems
Traditional network file access protocols allow a file system client to access a file server over a network. Such protocols include the Network File System (NFS) originally proposed by Sun Microsystems in the 1980's and the Common Internet File System (CIFS) utilized in Microsoft's Windows operating system and implemented in various other storage systems such as the well-known, open source Samba server. Details of NFS are shown in R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon, “Design and Implementation of the SUN Network File System”, in Proceedings of the Summer 1985 USENIX Conference, 1985.
Traditionally, these protocols require that all client accesses to the file system occur through a single network endpoint at the file server. This requirement creates a performance bottleneck at the file server because all I/O (input/output) operations are handled by a centralized server. To facilitate scaling, and to allow a large common storage pool to be easily allocated to different server functions, large capacity network file systems have been built on locally networked clusters of storage targets that allow block-level access to their data, so-called storage area networks (SANs). SANs offer high-speed parallel access to large amounts of block-level data, but do not export an actual file system. It is left to a network file server to organize a file system on top of these block storage devices and to mediate client access to those devices. In this approach, the benefits of high-speed parallel access to a SAN's storage targets are seen only by the file server, not the file system clients directly.
A recently proposed architecture for network file systems called “parallel NFS” or pNFS attempts to relieve the file server bottleneck by separating, at least in part, the file system's control and data flows and by leveraging the scalability of a SAN. pNFS is a sub-protocol of the recently ratified version 4 of NFS. NFSv4 is described in “NFSv4 Minor Version 1, Internet Draft”, draft-ietf-nfsv4-minorversionl-02.txt and pNFS is described in D. Black and S. Fridella, and J. Glasgow, “pNFS Block/Volume Layout”, NFSv4 Working Group, Internet Draft, draft-ietf-nfsv4-pnfs-block-06.txt, February 2008 (hereinafter “Black et al.”). In the examples used herein, NFS version 4 is referred to as “NFSv4” while NFSv4 with pNFS extensions is referred to simply as “pNFS” for readability. NFSv4 supports different variants of pNFS and there is currently work underway within the IETF to standardize one or more such variants.
In pNFS, the file system client sends control traffic to the file server control device called the “controller” and, in parallel, accesses one or more storage “targets” directly over a SAN to get block-level data. In this example, the file system client would communicate simultaneously with the file server controller via NFS and its pNFS extensions and with the storage target over a SAN protocol. Examples of SAN protocols include the Fibre Channel Protocol (FCP) or the Internet Small Computer Systems Interface (iSCSI), the latter of which is described in J. Satran, K. Meth, C. Sapuntzakis, M. Chadalapaka, and E. Zeidner, “Internet Small Computer Systems Interface (iSCSI)”, Request for Comments (RFC) 3720, April 2004.
Even if pNFS/iSCSI is available, a client might still perform normal NFS I/O over the controller connection, but enhanced performance is achieved when the client interacts with the storage target directly using pNFS. In iSCSI terminology, the client component that interacts with the storage target is called the “initiator”. This dual-connected approach allows the storage and bandwidth scalability of the SAN to be extended to file system clients, possibly over disjoint networks from the control connection to the file server. A SAN file system thus is a network file system based on split data/control channels. A SAN file system might use protocols for the control channel and the data channel that are also used for other communications outside the use of a SAN file system, such as to support other sorts of networked storage clusters. The data need not be strictly block-oriented, but might be object-oriented as described in B. Halevy, B. Welch, and J. Zelenka, “Object-based pNFS Operations”, NFSv4 Working Group, Internet Draft, draft-ietf-nfsv4-pnfs-obj-05.txt, February 2008 and related documents.
In this pNFS/iSCSI architecture, a file system client interacts with the controller to learn where and how to make direct storage accesses for a particular file by through the use of “layouts”, wherein a layout describes the mapping between a file and its underlying representation on a disk-based storage subsystem and is typically organized as an ordered list of block storage addresses corresponding to particular file byte ranges or “extents”. In order to operate directly against the SAN volume, a file system client requests from the file system controller a layout for a particular extent of a file. Once obtained, the client can perform I/O with respect to that extent by performing reads and/or writes directly to the storage subsystem using the block addresses with respect to a target indicated in the layout.
There are various types of layouts permitting read-only access, read-write access, and so forth. These access modes are explicitly indicated in the layout grant messages issued by the controller. A layout may be held across multiple open/close sequences so long as it remains valid. Should the underlying mapping of a file's data change, the controller sends invalidating messages to any clients holding affected layouts.
Unlike a file system, which provides very fine-grained access control and permissions for files, SAN volumes typically include only very coarse-grained access control. For example, in a file system, a file is typically owned by a particular user and that user can specify who and how others can modify the file, all on a per-file basis. However, a storage target has no notion of files. Instead, a storage array typically implements a very rudimentary mechanism dictating which servers can connect to what logical units (LUNs) on what targets. As such, a SAN file system must assume that the file system client is a trusted entity and can be depended upon to perform only those I/O operations that have been authorized by the file system controller. In fact, in such deployments, the file system client typically is implemented within a trusted operating system service that is tightly managed and controlled by a system operator.
Concurrent Access to SAN File Systems
In addition to describing the location of file data, layouts also serve as grants of read or write access permission to the data by the file server. This is how the controller coordinates simultaneous access to a file or extents within a file, and assures that multiple clients receive a consistent view of a shared extent. Multiple readers of a given extent are accommodated by the controller issuing multiple layouts granting read access. Exclusive write access is enforced by the controller issuing a single read-write layout. Should another client request conflicting access to the same storage, the controller first recalls the original layout, giving the previous client the opportunity to write back any pending changes, before granting the new layout request. In this case, the clients must synchronously perform all I/O operations against the centralized volume in order to achieve consistent file system operations. This is analogous to the “opportunistic locking” mechanism used in the CIFS network protocol, but at the granularity of file system blocks rather than whole files, with grant coordination handled by the controller and access enforcement handled by both the controller and the data storage devices.
Accessing a SAN File System over a WAN
The protocols for network file systems, and consequently SAN file systems, have been customarily designed assuming there is a Local Area Network (LAN) link or similar high-performance network between the client and the file server. That is, the intervening network path is assumed to have high bandwidth, low latency, and negligible packet loss rates. Typically, many of the protocol message exchanges between a file system client and server are serialized and result in a “chatty” interaction over the network as efficient use of the network is not required. While a given request or acknowledgement is in flight, processing often must be suspended until the reply message has been received. Some transactions may require several such back-and-forth exchanges. The sustained data transfer rate that a client and server attain over a network link, taking into account such request/response serialization, is commonly called the system's or application's throughput. On a low-latency LAN, this serialization often has a negligible impact on throughput, and aggregate rates are largely determined by the bandwidth at which individual blocks of data can be transferred through the system end points and across the network path.
However, it is often desirable to be able to access such file systems across a Wide Area Network (WAN) or other network that has similar characteristics. An example is where a business' information infrastructure is centralized but must be made accessible to remote offices or users. Likewise, in a scientific computing application (e.g., data analysis for oil exploration, simulations for drug design, weather and climate modeling, and so forth) very large data sets are manipulated through computational processes that are distributed across many nodes and, in certain cases, coordinating massive numbers of computation nodes across a WAN becomes a desirable configuration. In such a scenario, it is desirable to have computational nodes in different geographic regions accessing a shared centralized storage pool and it is important for such WAN-based storage access to be as efficient as possible.
WAN links typically operate with bandwidths that are a few orders of magnitude less than LANs and latencies that are a few orders of magnitude more than LANs. In many cases, higher WAN bandwidth can be purchased, but little if anything can be done about network latency. Because of the request-response nature of network file system protocols, network latency is often the more constraining limitation imposed upon system throughput compared to bandwidth.
In a sequence of exchanges in which latency dominates the transmission time (as can be the case with small messages like a request or an acknowledgement), a chatty protocol's throughput decays rapidly with increases network latency. In fact, it can be shown that throughput decays hyperbolically with respect to increasing network latency in some cases. For example, if network latency is doubled, then the performance of a latency-bound process is halved. Likewise, if network latency is increased one-thousand fold, as can happen when going from a LAN to a WAN, then performance is decreased by a factor of 1000. When latency is the bottleneck like this, increasing network bandwidth often has a negligible effect, if at all, on overall throughput.
Examples of such chattiness particular to SAN file system protocols include request/acknowledgement exchanges for block reads and writes, layout request/grant exchanges, controller recalls of prior grants, and synchronization points between client, file server, and data storage when a client's access permission for a given block of data must be verified at the time of access.