The exponential growth in the number of on chip transistors so reliably predicted by Moore's Law has proven to be a powerful driver for increases in computing performance over the past 40 years, although limitations associated with wire delay, power consumption, and heat generation have recently become significant challenges to traditional transistor scaling. The desire to maintain the historic rate of advancement in the industry, while avoiding the roadblocks associated with power consumption and wire delay have led to the consideration of several disruptive design strategies for next generation devices including many core processors and 3D vertical integration.
Pollack's and Amdahl's scaling laws indicate that for power-constrained chip designs, architectures that implement many simple, low power cores should maximize the system's overall performance-per-W, as long as the code is massively parallelizable. To avoid limitations in computation speed due to the serial portions of the code, asymmetric core architectures can be implemented where a few higher power serial cores augment the performance of the low power cores to provide additional throughput. Architectures that vertically integrate the cores in a 3D multi-tier package allow for a number of additional design advantages, including shorter wire lengths, increased packaging density, and heterogeneous technology integration that translate into a range of potential performance benefits such as decreases in noise, capacitance, and power consumption.
In a many-core system, the thermal profile across the chip can be leveled by actively migrating computations from hotter to cooler areas of the chip, reducing the problem of localized hotspots that have become problematic in modern architectures. While this Dynamic Core Migration (DCM) scheme can mitigate hotspots for most cores, serial cores with their potentially higher power densities, larger size, and smaller number may still experience hotspots. To compensate for the higher power densities the serial cores will either experience more throttling events during an intra-migration time slice, higher migration frequencies, or a dedicated local hotspot cooling solution would be required to handle the additional thermal overhead.
There is a significant amount of research in the area of hotspot cooling. However these solutions add complexity to the overall system, and may become difficult to implement in a 3D stack where both inter- and intra-layer fluidic routing would be required. In DCM schemes, there is parasitic computational cost associated with each throttling event that can become significant over time when the cycling is too rapid. Furthermore, rapid thermal cycling can lead to reduced lifetime reliability for the chip. To minimize the performance losses associated with these gating and throttling events, an optimized system should be designed that can operate for longer periods without requiring an idle for cool-down, and have as short of an idle time as possible.