1. Field of the Invention
The present invention generally relates to computer systems and architectures, and more particularly, to an ultrascalable, low power, and massively parallel supercomputer.
2. Description of the Prior Art
Massively parallel computing structures (also referred to as “ultra-scale or “supercomputers”) interconnect large numbers of compute nodes, generally, in the form of very regular structures, such as mesh, lattices, or torus configurations. The conventional approach for the most cost/effective ultrascalable computers has been to use standard processors configured in uni-processors or symmetric multiprocessor (SMP) configurations, wherein the SMPs are interconnected with a network to support message passing communications. Today, these supercomputing machines exhibit computing performance achieving hundreds of teraflops (see IBM Journal of Research and Development, Special Double Issue on Blue Gene, Vol. 49, Numbers 2/3, March/May 2005 incorporated by reference as if fully set forth herein). However, there are two long standing problems in the computer industry with the current cluster of SMPs approach to building ultra-scale computers: (1) the increasing distance, measured in clock cycles, between the processors and the memory (the memory wall problem) and (2) the high power density of parallel computers built of mainstream uni-processors or symmetric multi-processors (SMPs').
In the first problem, the distance to memory problem (as measured by both latency and bandwidth metrics) is a key issue facing computer architects, as it addresses the problem of microprocessors increasing in performance at a rate far beyond the rate at which memory speeds increase and communication bandwidth increases per year. While memory hierarchy (caches) and latency hiding techniques provide exemplary solutions, these methods necessitate the applications programmer to utilize very regular program and memory reference patterns to attain good efficiency (i.e. minimizing instruction pipeline bubbles and maximizing memory locality). This technique is thus not suited for modern applications techniques (e.g., using complicated data structures for unstructured meshes and object oriented programming).
In the second problem, high power density relates to the high cost of facility requirements (power, cooling and floor space) for such tera-scale computers.
It would be highly desirable to provide an ultra-scale supercomputing architecture that will reduce latency to memory, as measured in processor cycles, by at least an order of magnitude, and optimize massively parallel computing at petaOPS-scale at decreased cost, power, and footprint.
It would be further highly desirable to provide an ultra-scale supercomputing architecture that exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single ASIC.
It would be highly desirable to provide an ultra-scale supercomputing architecture that comprises a unique interconnection of processing nodes for optimally achieving various levels of scalability.
It would be highly desirable to provide an ultra-scale supercomputing architecture that comprises a unique interconnection of processing nodes for efficiently and reliably computing global reductions, distribute data, synchronize, and share limited resources.