Data center environments place a tremendous demand on computing systems (e.g., servers, blades, etc.) to provide significant amounts of I/O bandwidth. A server typically provides the required I/O bandwidth needs by supporting provisions for adding multiple I/O cards/devices (also referred to as “adapters”) and/or directly incorporating embedded devices within the server. The embedded devices and/or add-in adapter interfaces are typically, but not limited to, PCI Express, PCI/PCI-X, and HyperTransport. The adapters represent a variety of device classes, including storage (SCSI, SATA, SAS, RAID, backup, etc), networking (Ethernet, ATM), clustering (Infiniband, ServerNet), multimedia (video, audio), and others.
It is oftentimes impractical (e.g., due to reliability, cost, and component yield) to connect many I/O devices directly to a compute node or other processing element due to the large number of component pins that would be required. Processor, chipset, and component vendors have addressed these issues by partitioning the various functions and interfaces (e.g., as computing, memory, and I/O interfaces) into multiple devices. The architecture and partitioning scheme provides a generic and simple way to construct multiple platforms that range from small, simple systems that have one or two components to large systems with one or more instances of each component.
Larger systems (e.g., Opteron-based systems) may include multiple processor cores/sockets, multiple chipset components, and many I/O expansion slots. These systems are designed to optimize the CPU-CPU and CPU-memory bandwidth. Accordingly, most of the compute node's (or processor's) buses/interconnects are dedicated to connect memory, memory controllers, and/or other processors. Depending upon the complexity of the system, one or more processors may either have no additional interfaces available, or a very limited/restricted (perhaps in bandwidth) interface available to connect to the I/O subsystem (or other parts of the compute grid in a multi-compute node environment). This scenario can force the I/O or expansion chipsets (“chipset”) to the “corners” or periphery of the processing elements within the compute node.
Another side effect of reduced/limited connectivity between the chipset and the processor/memory elements is that there may be a large disparity between the amount of bandwidth on either side of the protocol translator (or “chipset”). For example, a system configuration may have a chipset component that supports over thirty-two lanes of PCI Express (PCIe), while the chipset to processor/memory interface only has at most eight lanes. Chipset vendors, on behalf of system vendors, have opted to include additional interfaces (e.g., HyperTransport) between the chipset and processor/memory components. The additional interfaces not only provide additional bandwidth, but also provide better balance between the various interfaces (chipsets, protocols, etc). The inclusion of additional interfaces to the chipset can reduce the number of chipset components required for a given design, resulting in cost savings.
Chipsets may have a very different “view” of the nodes (e.g., the processor and memory components). As mentioned previously, the optimization of the CPU-CPU and CPU-memory interconnect may not allow the chipset to be connected directly to each node. Chipset transactions to/from nodes must traverse from one node to another node, until the destination node is reached. Each link between nodes and/or the chipset represents one “hop”. From the chipset perspective, different nodes within the compute environment may be a different number of hops away. Nodes that have fewer hops are more “near,” whereas nodes with a higher number of hops from the chipset are more “far.” System performance is directly related to the amount of active chipset (e.g., I/O) bandwidth and the number of hops that the chipset-to-target-node. The chipset transactions are replicated at each node along the chipset-to-target-node path. The chipset transactions consume bandwidth from each local node's available bandwidth (e.g., memory) and thereby limit the amount of bandwidth of the processor(s) and other devices within that node.
When the chipset supports multiple links into the compute node environment, additional chipset bandwidth is available. Currently planned chipset architectures provide either soft or hard partitioning between the upstream (compute node) interfaces and downstream (compute node, fabric, or I/O) interfaces. Traffic (DMA, interrupts, messages, management/etc) is pinned from a downstream interface to only one upstream interface. This pinning (via software and/or hardware configuration/strapping) of a downstream interface to a single upstream interface may not provide the optimal system performance due to the number of hops particular traffic encounters between the chipset and target node.
The problem is very apparent when the operating system scheduler moves tasks/processes (e.g., drivers, applications) from one node to another within the compute environment. The dynamic movement of these processes can either improve or hinder system performance, depending upon the traffic, profile (e.g., number of hops) of the chipset.