The challenge of modern computing is to build economically efficient chips that incorporate more transistors to meet the goal of achieving Moore's law of doubling performance every two years. The limits of semiconductor technology are affecting this ability to grow in the next few years, as transistors become smaller and chips become bigger and hotter. The semiconductor industry has developed the system on a chip (SoC) as a way to continue high performance chip evolution.
So far, there have been four main ways to construct a high performance semiconductor. First, chips have multiple cores. Second, chips optimize software scheduling. Third, chips utilize efficient memory management. Fourth, chips employ polymorphic computing. To some degree, all of these models evolve from the Von Neumann computer architecture developed after WWII in which a microprocessor's logic component fetches instructions from memory.
The simplest model for increasing chip performance employs multiple processing cores. By multiplying the number of cores by eighty, Intel has created a prototype teraflop chip design. In essence, this architecture uses a parallel computing approach similar to supercomputing parallel computing models. Like some supercomputing applications, this approach is limited to optimizing arithmetic-intensive applications such as modeling.
The Tera-op, Reliable, Intelligently Adaptive Processing System (TRIPS), developed at the University of Texas with funding from DARPA, focuses on software scheduling optimization to produce high performance computing. This model's “push” system uses data availability to fetch instructions, thereby putting additional pressure on the compiler to organize the parallelism in the high speed operating system. There are three levels of concurrency in the TRIPS architecture, including instruction-level parallelism (ILP), thread-level parallelism (TLP) and data-level parallelism (DLP). The TRIPS processor will process numerous instructions simultaneously and map them onto a grid for execution in specific nodes. The grid of execution nodes is reconfigurable to optimize specific applications. Unlike the multi-core model, TRIPS is a uniprocessor model, yet it includes numerous components for parallelization.
The third model is represented by the Cell microprocessor architecture developed jointly by the Sony, Toshiba and IBM (STI) consortium. The Cell architecture uses a novel memory “coherence” architecture in which latency is overcome with a bandwidth priority and in which power usage is balanced with peak computational usage. This model integrates a microprocessor design with coprocessor elements; these eight elements are called “synergistic processor elements” (SPEs). The Cell uses an interconnection bus with four unidirectional data flow rings to connect each of four processors with their SPEs, thereby meeting a teraflop performance objective. Each SPE is capable of producing 32 GFLOPS of power in the 65 nm version, which was introduced in 2007.
The MOrphable Networked Micro-ARCHitecture (MONARCH) uses six reduced instruction set computing (RISC) microprocessors, twelve arithmetic clusters and thirty-one memory clusters to achieve a 64 GFLOPS performance with 60 gigabytes per second of memory. Designed by Raytheon and USC/ISI from DARPA funding, the MONARCH differs distinctly from other high performance SoCs in that it uses evolvable hardware (EHW) components such as field programmable compute array (FPCA) and smart memory architectures to produce an efficient polymorphic computing platform.
MONARCH combines key elements in the high performance processing system (HPPS) with Data Intensive Architecture (DIVA) Processor in Memory (PIM) technologies to create a unified, flexible, very large scale integrated (VLSI) system. The advantage of this model is that reprogrammability of hardware from one application-specific integrated circuit (ASIC) position to another produces faster response to uncertain changes in the environment. The chip is optimized to be flexible to changing conditions and to maximize power efficiency (3-6 GFLOPS per watt). Specific applications of MONARCH involve embedded computing, such as sensor networks.
These four main high performance SoC models have specific applications for which they are suited. For instance, the multi-core model is optimized for arithmetic applications, while MONARCH is optimized for sensor data analysis. However, all four also have limits.
The multi-core architecture has a problem of synchronization of the parallel micro-processors that conform to a single clocking model. This problem limits their responsiveness to specific types of applications, particularly those that require rapid environmental change. Further, the multi-core architecture requires “thread-aware” software to exploit its parallelism, which is cumbersome and produces quality of service (QoS) problems and inefficiencies.
By emphasizing its compiler, the TRIPS architecture has the problem of optimizing the coordination of scheduling. This bottleneck prevents peak performance over a prolonged period.
The Cell architecture requires constant optimization of its memory management system, which leads to QoS problems.
Finally, MONARCH depends on static intellectual property (IP) cores that are limited to combinations of specified pre-determined ASICs to program its evolvable hardware components. This restriction limits the extent of its flexibility, which was precisely its chief design advantage.
In addition to SoC models, there is a network on a chip (NoC) model, introduced by Arteris in 2007. Targeted to the communications industry, the 45 nm NoC is a form of SoC that uses IP cores in FPGAs for reprogrammable functions and that features low power consumption for embedded computing applications. The chip is optimized for on-chip communications processing. Though targeted at the communications industry, particularly wireless communications, the chip has limits of flexibility that it was designed to overcome, primarily in its deterministic IP core application software.
Various implementations of FPGAs represent reconfigurable computing. The most prominent examples are the Xilinx Virtex-II Pro and Virtex-4 devices that combine one or more microprocessor cores in an FPGA logic fabric. Similarly, the Atmel FPSLTC processor combines an AVR processor with programmable logic architecture. The Atmel microcontroller has the FPGA fabric on the same die to produce a fine-grained reconfigurable device. These hybrid FPGAs and embedded microprocessors represent a generation of system on a programmable chip (SOPC). While these hybrids are architecturally interesting, they possess the limits of each type of design paradigm, with restricted microprocessor performance and restricted deterministic IP core application software. Though they have higher performance than a typical single core microprocessor, they are less flexible than a pure FPGA model.
All of these chip types are two dimensional planar micro system devices. A new generation of three dimensional integrated circuits and components is emerging that is noteworthy as well. The idea to stack two dimensional chips by sandwiching two or more ICs using a fabrication process required a solution to the problem of creating vertical connections between the layers. IBM solved this problem by developing “through silicon vias” (TSVs) which are vertical connections “etched through the silicon wafer and filled with metal.” This approach of using TSVs to create 3D connections allows the addition of many more pathways between 2D layers. However, this 3D chip approach of stacking existing 2D planar IC layers is generally limited to three or four layers. While TSVs substantially limit the distance that information traverses, this stacking approach merely evolves the 2D approach to create a static 3D model.
In U.S. Pat. No. 5,111,278, Echelberger describes a 3D multi-chip module system in which layers in an integrated circuit are stacked by using aligned TSVs. This early 3D circuit model represents a simple stacking approach. U.S. Pat. No. 5,426,072 provides a method to manufacture a 3D IC from stacked silicon on insulation (SOI) wafers. U.S. Pat. No. 5,657,537 presents a method of stacking two dimensional circuit modules and U.S. Pat. No. 6,355,501 describes a 3D IC stacking assembly technique.
Recently, 3D stacking models have been developed on chip in which several layers are constructed on a single complementary metal oxide semiconductor (CMOS) die. Some models have combined eight or nine contiguous layers in a single CMOS chip, though this model lacks integrated vertical planes. MIT's microsystems group has created 3D ICs that contain multiple layers and TSVs on a single chip.
3D FPGAs have been created at the University of Minnesota by stacking layers of single planar FPGAs. However, these chips have only adjacent layer connectivity.
3D memory has been developed by Samsung and by BeSang. The Samsung approach stacks eight 2-Gb wafer level processed stack packages (WSPs) using TSVs in order to minimize interconnects between layers and increase information access efficiency. The Samsung TSV method uses tiny lasers to create etching that is later filled in with copper. BeSang combines 3D package level stacking of memory with a logic layer of a chip device using metal bonding.
See also U.S. Pat. No. 5,915,167 for a description of a 3D DRAM stacking technique, U.S. Pat. No. 6,717,222 for a description of a 3D memory IC, U.S. Pat. No. 7,160,761 for a description of a vertically stacked field programmable nonvolatile memory and U.S. Pat. No. 6,501,111 for a description of a 3D programmable memory device.
Finally, in the supercomputing sphere, the Cray T3D developed a three dimensional supercomputer consisting of 2048 DEC Alpha chips in a torus networking configuration.
In general, all of the 3D chip models merely combine two or more 2D layers. They all represent a simple bonding of current technologies. While planar design chips are easier to make, they are not generally high performance.
Prior systems demonstrate performance limits, programmability limits, multi-functionality limits and logic and memory bottlenecks. There are typically trade-offs of performance and power.
The present invention views the system on a chip as an ecosystem consisting of significant intelligent components. The prior art for intelligence in computing consists of two main paradigms. On the one hand, the view of evolvable hardware (EHW) uses FPGAs as examples. On the other hand, software elements consist of intelligent software agents that exhibit collective behaviors. Both of these hardware and software aspects take inspiration from biological domains.
First, the intelligent SoC borrows from biological concepts of post-initialized reprogrammability that resembles a protein network that responds to its changing environmental conditions. The interoperation of protein networks in cells is a key behavioral paradigm for the iSoC. The slowly evolving DNA root structure produces the protein network elements, yet the dynamics of the protein network are interactive with both itself and its environment.
Second, the elements of the iSoC resemble the subsystems of a human body. The circulatory system represents the routers, the endocrine system is the memory, the skeletal system is comparable to the interconnects, the nervous system is the autonomic process, the immune system provides defense and security as it does in a body, the eyes and ears are the sensor network and the muscular system is the bandwidth. In this analogy, the brain is the central controller.
For the most part, SoCs require three dimensionality in order to achieve high performance objectives. In addition, SoCs require multiple cores that are reprogrammable so as to maintain flexibility for multiple applications. Such reprogrammability allows the chip to be implemented cost effectively. Reprogrammability, moreover, allows the chip to be updatable and future proof. In some versions, SoCs need to be power efficient for use in embedded mobile devices. Because they will be prominent in embedded devices, they also need to be fault tolerant. By combining the best aspects of deterministic microprocessor elements with indeterministic EHW elements, an intelligent SoC efficiently delivers superior performance.
While the design criteria are necessary, economic efficiency is also required. Computational economics reveals a comparative cost analysis that includes efficiency maximization of (a) power, (b) interconnect metrics, (c) transistor per memory metrics and (d) transistor per logic metrics.
Problems that the System Solves
Optimization problems that the system solves can be divided into two classes: bi-objective optimization problems (BOOPs) and multi-objective optimization problems (MOOPs).
BOOPs consist of trade-offs in semiconductor factors such as (a) energy consumption versus performance, (b) number of transistors versus heat dissipation, (c) interconnect area versus performance and (d) high performance versus low cost.
Regarding MOOPs, the multiple factors include: (a) thermal performance (energy/heat dissipation), (b) energy optimization (low power use), (c) timing performance (various metrics), (d) reconfiguration time (for FPGAs and CPLDs), (e) interconnect length optimization (for energy delay), (f) use of space, (g) bandwidth optimization and (h) cost (manufacture and usability) efficiency. The combination of solutions to trade-offs of multiple problems determines the design of specific semiconductors. The present system presents a set of solutions to these complex optimization problems.
One of the chief problems is to identify ways to limit latency. Latency represents a bottleneck in an integrated circuit when the wait to complete a task slows down the efficiency of the system. Examples of causes of latency include interconnect routing architectures, memory configuration and interface design. Limiting latency problems requires the development of methods for scheduling, anticipation, parallelization, pipeline efficiency and locality-priority processing.