Computing systems are typically viewed as a processing core that is coupled to a plurality of “Input/Output” (I/O) devices. The processing core is often viewed as the central intelligence function of the computing system, while the I/O devices are often viewed as a means for sending information to the processing core and/or receiving information from the processing core.
A good example is a large computing system such as a UNIX based server or workstation. The processing core of a large computing system is usually implemented as a plurality of general purpose processor chips and a system memory that together execute the system's software routines. The I/O devices of a server or workstation are often implemented as some sort of “plug in” device (peripheral or otherwise). Examples of I/O devices within a server environment tend to include a graphics display, a networking interface, a data storage device (e.g., disk array unit), etc.
Large computing systems have traditionally used a bus to communicatively couple most all of the I/O devices to the processing core. For example, if a server's software requires a file from a disk drive unit, the file is transported from the disk drive unit to the processing core over a bus. Because a bus is a passive group of wires that are physically coupled to a plurality of I/O devices (or a plurality of I/O device connectors), typically, a number of different I/O devices are designed to communicate with the processing core over the same bus.
As such, system congestion (wherein two or more different I/O devices are contending for the resources of the bus) is not an unlikely occurrence. For example, if a disk drive unit and networking interface share the same bus; and, if both have information to send to the processing core at approximately the same time; then, one of the I/O devices has to wait for the other before its communication can commence (e.g., the networking adapter card, before sending information to the processing core, has to wait until the disk drive unit has sent its information to the processing core).
In cases where the processing core is of lower performance, no real loss in computing system performance is observed. That is, in a sense, if the processing core is only capable of handling the information from the I/O devices “one at a time” (e.g., if the processing core in the above example does not posses the resources to process the networking adapter card's information even if it was received “in parallel” with the disk drive unit's information), then the computing system may be said to be “processing core constrained”; and, there is no real loss in system performance as a result of the inefficiencies associated with the communication of the I/O devices over a shared bus.
The trend, however, is that processing core performance of large computing systems is outpacing bus performance. Semiconductor manufacturing technology improvements (which provide faster and more functionally robust processor chips) as well as “multi-processor” processing core designs (e.g., wherein a plurality of processor chips are designed to work together as a cooperative processing whole) have resulted in high performance processing core implementations that can simultaneously handle the emissions from two or more I/O devices.
As such, true losses in computing system performance are being observed for those high performance systems having a bus design between the processing core and the I/O devices of the system. In order to combat this trend, various system design approaches that “work around” the use of a bus as the principle means of communication between the processing core and the I/O devices have been proposed. One of these, referred to as “Infiniband”, embraces the use of a switching fabric between the processing core and the I/O devices. FIG. 1 shows an example of an Infiniband or other switching fabric based architecture.
The processing core of the computing system 100 shown in FIG. 1 may be viewed as the collection of hosts 1011 through 1016. Each of the hosts 1011 through 1016 has an associated processor 1031 through 1036 that may be assumed to have its own associated memory. Each of the hosts 1011 through 1016 are coupled to a switching fabric 104 via their own host channel adapter (HCA) 1021 through 1026. In a sense, each of the HCAs 1021 through 1026 act as a media access layer for their corresponding processor (e.g., by preparing and receiving packets that are sent/received to/from the switching fabric 104).
The I/O devices of the computing system are referred to as its “targets” 1071 through 1076. Each of the targets 1071 through 1076 has an associated I/O unit 1081 through 1086 (e.g., a gateway to another network, a file server, a disk array, etc.) and target channel adapter (TCA) 1091 through 1096. Similar to the HCAs 1021 through 1026, the TCAs 1091 through 1096 act as a media access layer for their corresponding I/O (e.g., by preparing and receiving packets that are sent/received to/from the switching fabric 104).
The I/O units 1081 through 1086 are communicatively coupled to the processors 1031 through 1036 through the switching fabric 104. A switching fabric 104 is a network of switching nodes such as switching nodes 1051 through 1055. Consistent with the use and purpose of a network, the switching nodes 1051 through 1055 are responsible for directing packets toward their appropriate destination. For example, if I/O unit 1086 desires to send information to processor unit 1031, one or more packets that contain the information are directed over the switching fabric 104 from network access link 10612 to network access link 1061.
As such, switching node 1055 will direct these packets (upon their reception from access link 10612) toward switching node 1052 (e.g., by directing them to switching node 1051 which subsequently directs them to switching node 1052). A number of sophisticated computer architecture approaches are possible through the use of the switching fabric 104. These include (among possible others): 1) the implementation of a multi-processor computing system (because the switching fabric 104 allows the processors 1031 through 1036 to efficiently communicate with one another); 2) intelligent I/O units (because the switching fabric 104 allows the I/O units 1081 through 1086 to efficiently communicate with one another); 3) scalability (i.e., if an increase in processing performance is desired, more processors can be coupled to the network; if I/O needs to be expanded, more I/O units can be added to the fabric, with the fabric being expanded to meet the increased connectivity, and/or, if faster communication is desired through the network 104, more switches can be added to the network 104); and 4) partitioning (wherein a subset of processors are identified as being part of a unique multi-processing core resource that can operate privately).
The switching fabric 104 also provides a performance advantage over bus architectures because a large number of communications can be simultaneously carried between the various processors and I/O units. That is, a particular processor or I/O unit typically does not have to “wait” to send information until another unit has completed its own transmission of information. As a result, the various units are allowed to simultaneously inject their information into the network.