FIG. 1 depicts a simplified version of a data center of the general type that an enterprise that requires high availability and network storage (e.g., a financial institution) might use. Data center 100 includes redundant Ethernet switches with redundant connections for high availability. Data center 100 is connected to clients via network 105 via a firewall 115. Network 105 may be, e.g., an enterprise Intranet, a DMZ and/or the Internet. Ethernet is well suited for TCP/IP traffic between clients (e.g., remote clients 180 and 185) and a data center.
Within data center 105, there are many network devices. For example, many servers are typically disposed on racks having a standard form factor (e.g., one “rack unit” would be 19″ wide and about 1.25″ thick). A “Rack Unit” or “U” is an Electronic Industries Alliance (or more commonly “EIA”) standard measuring unit for rack mount type equipment. This term has become more prevalent in recent times due to the proliferation of rack mount products showing up in a wide range of commercial, industrial and military markets. A “Rack Unit” is equal to 1.75″ in height. To calculate the internal useable space of a rack enclosure you would simply multiply the total amount of Rack Units by 1.75″. For example, a 44 U rack enclosure would have 77″ of internal usable space (44×1.75). Racks within a data center may have, e.g., about 40 servers each. A data center may have thousands of servers, or even more. Recently, some vendors have announced “blade servers,” which allow even higher-density packing of servers (on the order of 60 to 80 servers per rack).
However, with the increasing numbers of network devices within a data center, connectivity has become increasingly complex and expensive. At a minimum, the servers, switches, etc., of data center 105 will typically be connected via an Ethernet. For high availability, there will be at least 2 Ethernet connections, as shown in FIG. 1.
Moreover, it is not desirable for servers to include a significant storage capability. For this reason and other reasons, it has become increasingly common for enterprise networks to include connectivity with storage devices such as storage array 150. Historically, storage traffic has been implemented over SCSI (Small Computer System Interface) and/or FC (Fibre Channel).
In the mid-1990's SCSI traffic was only able to go short distances. A topic of key interest at the time was how to make SCSI go “outside the box.” Greater speed, as always, was desired. At the time, Ethernet was moving from 10 Mb/s to 100 Mb/s. Some envisioned a future speed of up to 1 Gb/s, but this was considered by many to be nearing a physical limit. With 10 Mb/s Ethernet, there were the issues of half duplex and of collisions. Ethernet was considered to be somewhat unreliable, in part because packets could be lost and because there could be collisions. (Although the terms “packet” and “frame” have somewhat different meanings as normally used by those of skill in the art, the terms will be used interchangeably herein.) FC was considered to be an attractive and reliable option for storage applications, because under the FC protocol packets are not intentionally dropped and because FC could already be run at 1 Gb/s. However, during 2004, both Ethernet and FC reached speeds of 10 Gb/s. Moreover, Ethernet had evolved to the point that it was full duplex and did not have collisions. Accordingly, FC no longer had a speed advantage over Ethernet. However congestion in a switch may cause Ethernet packets to be dropped and this is an undesirable feature for storage traffic.
During the first few years of the 21st century, a significant amount of work went into developing iSCSI, in order to implement SCSI over a TCP/IP network. Although these efforts met with some success, iSCSI has not become very popular: iSCSI has about 1%-2% of the storage network market, as compared to approximately 98%-99% for FC.
One reason is that the iSCSI stack is somewhat complex as compared to the FC stack. Referring to FIG. 7A, it may be seen that iSCSI stack 700 requires 5 layers: Ethernet layer 705, IP layer 710, TCP layer 715, iSCSI layer 720 and SCSI layer 725. TCP layer 715 is a necessary part of the stack because Ethernet layer 705 may lose packets, but yet SCSI layer 725 does not tolerate packets being lost. TCP layer 715 provides SCSI layer 725 with reliable packet transmission. However, TCP layer 715 is a difficult protocol to implement at speeds of 1 to 10 Gb/s. In contrast, because FC does not lose frames, there is no need to compensate for lost frames by a TCP layer or the like. Therefore, as shown in FIG. 7B, FC stack 750 is simpler, requiring only FC layer 755, FCP layer 760 and SCSI layer 765.
Accordingly, the FC protocol is normally used for communication between servers on a network and storage devices such as storage array 150. Therefore, data center 105 includes FC switches 140 and 145, provided by Cisco Systems, Inc. in this example, for communication between servers 110 and storage array 150.
1 RU and Blade Servers are very popular because they are relatively inexpensive, powerful, standardized and can run any of the most popular operating systems. It is well known that in recent years the cost of a typical server has decreased and its performance level has increased. Because of the relatively low cost of servers and the potential problems that can arise from having more than one type of software application run on one server, each server is typically dedicated to a particular application. The large number of applications that is run on a typical enterprise network continues to increase the number of servers in the network.
However, because of the complexities of maintaining various types of connectivity (e.g., Ethernet and FC connectivity) with each server, each type of connectivity preferably being redundant for high availability, the cost of connectivity for a server is becoming higher than the cost of the server itself. For example, a single FC interface for a server may cost as much as the server itself. A server's connection with an Ethernet is typically made via a network interface card (“NIC”) and its connection with an FC network is made with a host bus adaptor (“HBA”).
The roles of devices in an FC network and a Ethernet network are somewhat different with regard to network traffic, mainly because packets are routinely dropped in response to congestion in a TCP/IP network, whereas frames are not intentionally dropped in an FC network. Accordingly, FC will sometimes be referred to herein as one example of a “no-drop” network, whereas Ethernet will be referred to as one manifestation of a “drop” network. When packets are dropped on a TCP/IP network, the system will recover quickly, e.g., in a few hundred microseconds. However, the protocols for an FC network are generally based upon the assumption that frames will not be dropped. Therefore, when frames are dropped on an FC network, the system does not recover quickly and SCSI may take minutes to recover.
Currently, a port of an Ethernet switch may buffer a packet for up to about 100 milliseconds before dropping it. As 10 Gb/s Ethernet is implemented, each port of an Ethernet switch would need approximately 100 MB of RAM in order to buffer a packet for 100 milliseconds. This would be prohibitively expensive.
For some enterprises, it is desirable to “cluster” more than one server, as indicated by the dashed line around servers S2 and S3 in FIG. 1. Clustering causes an even number of servers to be seen as a single server. For clustering, it is desirable to perform remote direct memory access (“RDMA”), wherein the contents of one virtual memory space (which may be scattered among many physical memory spaces) can be copied to another virtual memory space without CPU intervention. The RDMA should be performed with very low latency. In some enterprise networks, there is a third type of network that is dedicated to clustering servers, as indicated by switch 175. This may be, for example, a “Myrinet,” a “Quadrix” or an “Infiniband” network.
Therefore, clustering of servers can add yet more complexity to data center networks. However, unlike Quadrix and Myrinet, Infiniband allows for clustering and provides the possibility of simplifying a data center network. Infiniband network devices are relatively inexpensive, mainly because they use small buffer spaces, copper media and simple forwarding schemes.
However, Infiniband has a number of drawbacks. For example, there is currently only one source of components for Infiniband switches. Moreover, Infiniband has not been proven to work properly in the context of, e.g., a large enterprise's data center. For example, there are no known implementations of Infiniband routers to interconnect Infiniband subnets. While gateways are possible between Infiniband and Fibre Channel and Infiniband to Ethernet, it is very improbable that Ethernet will be removed from the datacenter. This also means that the hosts would need not only an Infiniband connection, but also an Ethernet connection.
Accordingly, even if a large enterprise wished to ignore the foregoing shortcomings and change to an Infiniband-based system, the enterprise would need to have a legacy data center network (e.g., as shown in FIG. 1) installed and functioning while the enterprise tested an Infiniband-based system. Therefore, the cost of an Infiniband-based system would not be an alternative cost, but an additional cost.
It would be very desirable to simplify data center networks in a manner that would allow an evolutionary change from existing data center networks. An ideal system would provide an evolutionary system for consolidating server I/O and providing low latency and high speed at a low cost.