Interconnection network implements communication and synchronization among different processors, which are critical to connect processors, memories, I/O devices. Interconnection network is important to performance and scalability of a high-performance computer system. How to provide low latency and efficient communication among processors based on existing technology is a key problem to be solved urgently.
The development of microprocessor technology improves the computational power of a single processor, so that a higher request to the performance of interconnection network has been set. In fact, there is a 30% gap between growth of interconnection network bandwidth and growth of microprocessor performance. Therefore, latency and bandwidth of an interconnection network have been the bottlenecks to improve the performance of a high-performance computer system.
Along with development of semiconductor technology and progress of circuit technology, serial communication has been an efficient signal transmission mode. The use of high-speed serial channel can significantly improve pin bandwidth, thereby reduce the number of pin. In early 1990s, router pin bandwidth is limited in 10 Gbps. Entering 21st century, pin bandwidth can reach 10˜20 Tbps. Advances in these technology and high-radix router is possible.
The high-radix router has been development trends of interconnection network. The interconnection network, which is composed of high-radix routers, can connect tens of thousands of routers by a few hops. In addition, high-radix networks can reduce network diameter, implement efficient communication between processors, reduce message latency and design cost, and improve system performance.
In 2006, Cray Blackwidow computer firstly utilizes 64-radix router chip, which interconnects 32000 processors by Clos network and can assure distance between two processors is not more than 7 hops. In 2012, Cray Cascade system utilizes 48-radix router chip. It interconnects 370216 processors by Dragonfly network, and it can assure the distance between two processors is not more than 5 hops. However, four virtual channels are required to provide deadlock-free adaptive routing.
However, how to design efficient interconnection topology and efficient routing algorithm based on existing chip technology and high-radix router technology is a key research subject to be solved urgently. Although Blackwidow Clos utilizes high-radix router, cost is high and diameter is large, which affect the system performance. Cascade Dragonfly network also utilizes high-radix router and it has high scalability, low cost, small diameter. But it needs four virtual channels to avoid deadlock which increases design complexity and route latency, limiting the system performance.
In addition, communication locality is a very good feature in an interconnection network, which can effectively improve performance and save energy. The communication locality oriented high-radix network and the new routing algorithm provide this feature.