In a virtual data center environment, data compute nodes (e.g., virtual machines) may be deployed in a cluster. A cluster includes a number of network nodes, including hosts to run the data compute nodes. In one embodiment, the cluster includes or is otherwise associated with switches to handle storage, local, and wide-area network traffic, and storage for the data compute nodes. The deployment and management of such a cluster includes making sure the storage is accessible by the hosts. For example, an administrator creates a unique initiator identifier for each host and sets up various parameters that will be used by the host for protocol negotiation. Similarly, an administrator creates a unique target identifier for each storage and sets up various parameters that will be used by the storage for the protocol negotiation. The hosts discover storage targets by querying a name server (e.g., within a switch) and negotiate parameters for use in a session established between hosts and storage. For example, the negotiation between a host and storage may include the host transmitting transport protocol login service parameter values for queue depth, encryption, etc. to storage and storage responding with corresponding parameter values supported by storage. As such, the deployment and management of a cluster requires a significant amount of manual configuration followed by potentially heavy network traffic (e.g., in a large deployment) during the bidirectional communications of the negotiation of parameters.
Additionally, network nodes within a cluster may be configured for replication to a disaster recovery data center. For example, applications running within the cluster configured for replication have write commands that are directed to primary storage copied and sent over a wide area network (WAN) to storage within the disaster recovery data center. Consequently, WAN bandwidth between primary and disaster recovery data centers is important to ensure data is copied to the data recovery site. WAN bandwidth, however, is expensive and can become congested. To handle congestion, a switch may transmit a pause frame to one or more network nodes at the previous hop. The pause frame is pushed to initiators directly or indirectly connected to the switch, causing all initiators to pause or stop sending input/output (I/O) traffic. As a result, a single or minority of applications or data compute nodes within a cluster may be the primary cause of congestion, but all initiators have their data traffic paused by the congestion.