Over time, various interconnects and protocols have been developed to address the various interconnectivity issues associated with computing. Several examples of interconnectivity include server-based clustering, storage networks, intranet networks, and many others.
Today, it is common for a single installation to have a plurality of interconnects for these various interconnectivity solutions. For example, FIG. 1 shows a typical environment where a plurality of servers 10a, 10b, 10c are connected to each other via high speed Ethernet, such as through a switch 20. This switch 20 allows the various servers to exchange data with each other. This switch 20 may also connect to a second switch or gateway 30, which provides a second interface, such as FibreChannel to the storage devices 40a, 40b. 
In another embodiment, such as that shown in FIG. 2, a cluster switch 50 is used to connect a plurality of servers, such as mail server 60a, application server 60b and data base server 60c together.
FIG. 3 shows a typical computer architecture, showing the interconnect between a plurality of servers, such as servers 110a-110f. In this embodiment, each server 110 has a PCI Express bridge device. These devices each communicate with a respective switch 120a,120b, each of which has a dedicated PCI Express bridge device for each server. As shown in FIG. 4, each of the switches 120a, 120b may include an upstream link 124 and a number of downstream links 123. Internal to each switch 120 is a plurality of PCI Express bridge devices 121, which may be operating in transparent or non-transparent mode. These internal PCI Express bridge devices 121 are connected to one another using one or more internal PCI or any other parallel busses 122. While FIG. 4 shows 8 PCI Express bridge devices 121, the number of bridges contained within a switch 120 is not limited. A peer to peer communication bus 125 may also be used.
As shown in FIG. 4, each link of the Virtual PCI Express Bridge 121 is configurable as Downstream port, which is tantamount to Transparent Bridge (TB) or alternately, as a Non Transparent Bridge (NTB). When a particular link is connected to PCI Express Endpoint or connected to another Switch 120 in the context of a hierarchy of Switch clusters, it is typically configured as Transparent Bridge (TB). When this link is connected to a server 110 and consequently its Root Complex Processor (RCP), it is typically configured as a Non Transparent Bridge (NTB). A NTB consists of two back-to-back PCI Express endpoints. A NTB allows the isolation of the two domains respectively belonging to the RCP of the server and the RCP connected to the upstream port of Switch 120.
In the TB mode, there are Base Address Register (BAR) and Limit Register that are used to direct PCI Express packets that have the starting address embedded as a field in the packet where data is to be accessed. Every port in the Switch that forms the links for the Switch 120 has its individual BAR and Limit Registers. These registers are initialized by the System software at boot time. The BARs and Limit Registers then direct the traffic to any other port of the Switch 120.
In NTB mode, there are extra hardware resources per port. In addition to BAR and Limit Registers there is Address Translation Registers that are implemented as a Look Up Table (LUT). This Address Translation Table allows for translating the starting address of PCI Read Or Write message coming from one side of the link to be modified going to the other side of the link.
Each link on the switch 120 represents a connection with one of these internal PCI Express bridge devices 121. The servers 110 RCP may each be attached to the respective NTB ports of switch 120. FIG. 3 shows two switches 120a, 120b, each in communication with three servers 110. However, the number of servers that can be served by a single switch is not limited by this disclosure. If the number of servers 110 that are to be clustered exceeds the number of ports available on the switch 120, multiple switches 120 may be used. For example, the upstream link of each switch 120 may be in communication with downstream link of another switch 120. If more than two switches 120 are needed, a hierarchy of switches may be employed, where each switch 120 is in communication with a central switch 130. The central switch 130 is then used to connect these switches 120 together.
Therefore, in operation, the CPU on the server 110a generates a message that it wishes to send to another node, such as server 110d. It creates the data payload, or application layer payload. In many embodiments, TCP/IP is used as the transport protocol. The message implies a starting address and subsequent data content embodied in a single packet. This message with its destination address is sent to the switch 120a. It enters switch 120a through a first internal PCI Express bridge device 121 (see FIG. 4). Switch 120a, typically using its Base Address Register (BAR) determines the destination of this message. The Switch 120A realizes the address of the message is not for any of the links supported by the Switch via its respective PCI Express Bridge. It then transmits the message out of switch 120a via a second internal PCI Express bridge device 121 in connection with the upstream link 124. Switch 120b receives the incoming message via another internal PCI Express bridge device 121, may store the incoming message, and determines which link is connected to server 110d. It then sends this message to that server 110d using another internal PCI Express bridge device 121. In this case, the message is delivered at wire speed, however, latency is incurred due to the fact that the message may be stored at the intermediate switches 120a, 120b. In this example, the message also passed through four different PCI Express bridge devices 121 between server 110a and server 110d. 
However, if multiple transactions occur simultaneously that involve servers connected to different switches 120a, 120b, more latency may be incurred, as all of this traffic must pass through the single upstream connection in the switches 120a, 120b. This causes data to be held in FIFO and be subjected to some arbitration scheme after which it is allowed to use the path between the two Switches. In some embodiments, this congestion may be alleviated by increasing the bandwidth of this link. For example, in this embodiment, if the upstream link had a bandwidth of at least 3 times each downstream link, all communication would appear to be non-blocking between three servers connected to Switch 120a communicating simultaneously with 3 servers connected to Switch 120b in a round-robin arbitration method.
However, there are limits to the speeds that can be achieved on this upstream link 124. In addition, the storage requirements of the switch 120 may be tremendous, depending on the number of connected servers, compute and storage respectively, and the traffic generated by each. Therefore, it would be beneficial if there were an improved network switch and method of moving data between servers.