1. Field of the Invention
This invention relates to high performance computing network systems, and more particularly, to reducing power consumption during data transport across multiple processors when link utilization is low.
2. Description of the Relevant Art
The performance of computing systems is dependent on both hardware and software. In order to increase the throughput of computing systems, the parallelization of tasks is utilized as much as possible. To this end, compilers may extract parallelized tasks from program code and many modern processor core designs have deep pipelines configured to perform chip multi-threading (CMT). In hardware-level multi-threading, a simultaneous multi-threaded processor core executes hardware instructions from different software processes at the same time. In contrast, single-threaded processors operate on a single thread at a time.
In order to utilize the benefits of CMT on larger workloads, the computing system may be expanded from a single-socket system to a multi-socket system. For example, scientific computing clusters utilize multiple sockets. Each one of the multiple sockets includes a processor with one or more cores. The multiple sockets may be located on a single motherboard, which is also referred to as a printed circuit board. Alternatively, the multiple sockets may be located on multiple motherboards connected through a backplane in a server box, a desktop, a laptop, or other chassis.
In a symmetric multi-processing system, each of the processors shares one common store of memory. In contrast, each processor in a multi-socket computing system includes its own dedicated store of memory. In a multi-socket computing system, each processor is capable of accessing a memory store corresponding to another processor, transparent to the software programmer. A dedicated cache coherence link may be used between two processors within the multi-socket system for accessing data stored in caches or a dynamic random access memory (DRAM) of another processor. Systems with CMT use an appreciable amount of memory bandwidth. The dedicated cache coherence links in a multi-socket system provide near-linear scaling of performance with thread count.
The power consumption of modern integrated circuits (IC's) has become an increasing design issue with each generation of semiconductor chips. As power consumption increases, more costly cooling systems are utilized to remove excess heat and prevent IC failure. The IC power dissipation constraint is not only an issue for portable computers and mobile communication devices, but also for high-performance stationary computing systems. In order to manage power consumption, a chip-level power management system typically disables portions of the chip when experiencing no utilization for a given time period. Sleep modes and clock disabling are used in these portions. However, the multiple similar logic structures are not disabled if some of the structures experience utilization, even low utilization.
In view of the above, methods and mechanisms for reducing power consumption during data transport across multiple processors when utilization is low are desired.