For the past several decades, the high-performance computing (HPC) community has relied on rapidly increasing clock frequencies and increasingly large systems to meet the escalating performance demands of applications. However, such an approach is becoming less feasible due to power bottlenecks. Clock frequencies have slowed due to power and cooling limits of integrated circuits, and it is becoming economically infeasible to increase system sizes due to energy and cooling costs, which are becoming the dominant factor in the total cost of ownership.
To deal with power bottlenecks, HPC systems have started on a trend towards increased heterogeneity, with existing systems integrating specialized microprocessor cores, graphics processing units (GPUs), and field-programmable gate arrays (FPGAs). Such hybrid systems tend to provide improved energy efficiency compared to general-purpose microprocessors by using specialization to significantly reduce power requirements while also improving performance. As a motivating example, the NOVO-G supercomputer, which uses 192 FPGAs in 24 nodes, has achieved speedups of more than 100,000× compared to a 2.4 GHz Opteron for computational biology applications. Such speedup provides performances similar to Roadrunner and Jaguar—two of the top supercomputers—even when assuming perfect linear scaling of performance for additional cores on those machines. Furthermore, traditional supercomputers typically require between 2 and 7 megawatts of power, whereas NOVO-G consumes a maximum of 8 kilowatts.
Although hybrid systems provide significant advantages compared to traditional HPC systems, effective usage of such systems is currently limited by significantly increased application design complexity that results in unacceptably low productivity. While parallel programming has received much recent attention, the inclusion of heterogeneous resources adds additional complexity that limits usage to device experts. For example, with an FPGA system, application designers often must be experts in digital design, hardware description languages, and synthesis tools. GPU systems, despite commonly being programmed with high-level languages, share similar challenges due to architecture-specific considerations that have significant effects on performance.