A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access data containers, such as files and logical units, stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing file-based and block-based protocol messages (in the form of packets) to the system over the network.
A plurality of storage systems may be interconnected to provide a storage system cluster configured to service many clients. Each storage system or node may be configured to service one or more volumes, wherein each volume stores one or more data containers. Communication among the nodes involves the exchange of information between two or more entities interconnected by communication links. These entities are typically is software programs executing on the nodes. The nodes communicate by exchanging discrete packets or messages of information according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Each node generally provides its services through the execution of software modules, such as processes. A process is a software program that is defined by a memory address space. For example, an operating system of the node may be implemented as a single process with a large memory address space, wherein pieces of code within the process provide operating system services, such as process management. Yet, the node's services may also be implemented as separately-scheduled processes in distinct, protected address spaces. These separate processes, each with its own process address space, execute on the node to manage resources internal to the node and, in the case of a database or network protocol, to interact with various network entities.
Services that are part of the same process address space communicate by accessing the same memory space. That is, information exchanged between services implemented in the same process address space is not transferred, but rather may be accessed in a common memory. However, communication among services that are implemented as separate processes is typically effected by the exchange of messages. For example, information exchanged between different addresses spaces of processes is transferred as one or messages between different memory spaces of the processes. A known message-passing mechanism provided by an operating system to transfer information between process address spaces is the Inter Process Communication (IPC) mechanism.
Resources internal to the node may include communication resources that enable a process on one node to communicate over the communication links or network with another process on a different node. The communication resources include the allocation of memory and data structures, such as messages, as well as a network protocol stack.
The network protocol stack, in turn, comprises layers of software, such as a session layer, a transport layer and a network layer. The Internet protocol (IP) is a network layer protocol that provides network addressing between nodes, whereas the transport layer provides a port service that identifies each process executing on the nodes and creates a connection is between those processes that indicate a willingness to communicate. Examples of conventional transport layer protocols include the reliable connection (RC) protocol and the Transmission Control Protocol (TCP).
Broadly stated, the connection provided by the transport layer, such as that provided by TCP, is a reliable, securable logical circuit between pairs of processes. A TCP process executing on each node establishes the TCP connection in accordance with a conventional “3-way handshake” arrangement involving the exchange of TCP message or segment data structures. The resulting TCP connection is identified by port numbers and IP addresses of the nodes. The TCP transport service provides reliable delivery of a message using a TCP transport header. The TCP protocol and establishment of a TCP connection are described in Computer Networks, 3rd Edition, particularly at pgs. 521-542, which is hereby incorporated by reference as though fully set forth herein.
Flow control is a protocol function that controls the flow of data between network protocol stack layers in communicating nodes. At the transport layer, for example, flow control restricts the flow of data (e.g., bytes) over a connection between the nodes. The transport layer may employ a fixed sliding-window mechanism that specifies the number of bytes that can be exchanged over the network (communication link) before acknowledgement is required. Typically, the mechanism includes a fixed sized window or buffer that stores the data bytes and that is advanced by the acknowledgements.
The session layer manages the establishment or binding of an association between two communicating processes in the nodes. In this context, the association is a session comprising a series of interactions between the two communicating processes for a period of time, e.g., during the span of a connection. Upon establishment of the connection, the processes take turn exchanging commands and data over the session, typically through the use of request and response messages. Flow control in the session layer concerns the number of outstanding request messages (requests) that is allowed over the session at a time. Laggard response messages (responses) or long-running requests may force the institution of session layer flow control to limit the flow of requests between the processes, thereby adversely impacting the session.
A solution that enables a session to continue to perform at high throughput even in the event of a long-running request or a lost request or response is described in the above-referenced U.S. Pat. No. 7,443,872 entitled SYSTEM AND METHOD FOR MULTIPLEXING CHANNELS OVER MULTIPLE CONNECTIONS IN A STORAGE SYSTEM CLUSTER. Here, a network protocol employs multiple request channels within a session to allow high levels of concurrency, i.e., to allow a large number of requests to be outstanding within each channel. Multiple channels further allow a plurality of sessions to be multiplexed over the connections to thereby insulate the sessions from lost throughput due to laggard responses or long-running requests.
Broadly stated, each channel is embodied as a request window that stores outstanding requests sent over the connection. Each request window has a predetermined initial sequence window size and the total number of outstanding requests in a session is the sum of the window sizes of all the channels in the session. In addition, each request has a sequence number that is unique for that request and specifies its sequence in the channel. Coupling the sequence number with a defined sequence window size provides flow and congestion control, limiting the number of outstanding requests in the channel. However, if the sequence number is also used to specify an order of execution of requests, then no requests can be executed out-of-order or concurrently within the channel. Requests on different channels can be executed concurrently or out-of-order respect to each other, but there is no way to enforce an ordering of the requests in different channels with respect to each other. It is desirable to be able to specify that a number of requests can be executed in arbitrary order, but then occasionally insert a barrier that requires that all requests up to a certain point must be executed before any request after that point. Additionally, it is desirable to specify an exact order of execution, while occasionally allowing out of order execution or, alternately, to permit any intermediate degree of control from completely ordered execution to completely arbitrary execution ordering.