As the semiconductor industry moves into nanometer technologies, interconnect latency is increasing and substantial design constraints are occurring for the shrinking wires. Conversely, gate delay times are decreasing, enabling exponentially increased computing power. Because of the interconnect latency and corresponding design constraints, the input/output interconnect rate is not able to keep up with computing power capabilities of a particular technology node.
In addition, the available energy for the system is decreasing for off-chip and on-chip interconnect systems. Accordingly, multiple lower-performance cores arranged in parallel are replacing the single high-performance core. This arrangement is referred to as the many-core architecture. By utilizing the many-core architecture, the latency budget is increased for the intra-core interconnects and the wiring length distributions are significantly modified. Accordingly, numerous processors are scaling to the many-core system.
Indeed, the number of processing cores comprising a MPU (microprocessor unit) or ASIC (application specific integrated circuit) is rapidly increasing. Some recent examples include the IBM Cell 9-core chip (an IBM 64-bit Power Architecture™ core with eight specialized co-processors), the NVIDIA GT200 30-core chip (three streaming multiprocessors (SMs) each having eight streaming processors (SPs) and two special function units (SFUs)), the Tilera TILE64 64-core prototype chip, and the Intel TeraFlops 80-core research chip. This trend has been driven by the performance limitations associated with single-core MPUs due to device operating frequency and MPU die size. The rapid expansion of processing cores has also spurred the implementation of network-on-chip (NoC) architectures.
In addition, the overall increase of devices on a chip has dramatically increased the demands on intra-core and inter-core interconnect performance. In particular, demands for reduction in interconnect signal latency, reduction in energy/bit, increased bandwidth (BW)/link, and increased bisection BW (biBW) are dramatically increasing.
By way of example, the Tilera TILE64 64-core prototype chip utilizes 32 Tbps of on-chip interconnect BW and 2 Tbps of biBW. The Intel TeraFlops research chip (80 cores) requires a biBW of about 1 Tbps with BW/link of hundreds of Gbps. The interconnect demands for such systems will significantly increase as three-dimensional integrated circuits (3D ICs) rapidly expand the number of cores and transistors within an MPU, ASIC, or mixed-signal system. A 3D IC is a single integrated circuit built by stacking silicon wafers and/or dies and interconnecting them vertically so that they behave as a single device. The move to 3D ICs aims to achieve smaller footprints for devices and reduce distances that a signal must travel to get to particular circuits on the chip. FIGS. 1A and 1B illustrate the two-dimensional (2D) NoC and 3D IC NoC arrangement, respectively. Referring to FIGS. 1A and 1B, it can be seen that as the NoC is implemented in 3D, multiple tiers of processing elements (PEs) and various higher cache levels (L2, L3, and L4) move vertical. Therefore, each node switching element is integrated with horizontal and vertical links. The vertical links are provided by through silicon vias (TSVs).
Many-core 3D ICs designed for different applications place various demands on vertical BWs. For example, as shown in FIG. 2, the average vertical memory access BW for a stacked memory-on-logic 3D IC is estimated at 2 Tbps based on the 2D planar multi-core processor. In contrast, the average vertical BW for computation applications, at 5 Tbps, is nearly three times that for average memory access BW. Moreover, assuming that 3D ICs for multimedia applications conform to conventional digital video frame rates of 25-30 frames/s, the average vertical BW would exceed 100 Tbps.
Furthermore, different NoC topologies for many-core 3D ICs have different in-plane and vertical BW requirements. Expansion from a 2D many-core IC to a 3D many-core IC as illustrated by FIGS. 1A and 1B will increase the biBW for the same number of cores. Furthermore, the biBW will also increase dramatically if the application requires an increase in the number of links for each NoC node or core. This is shown schematically in FIG. 3 and provided in Table 1, where n is the number of nodes (or cores).
TABLE 13DEnhancedMax2D2DbiBWOne more link BWEnhanced BWMeshbiBW nn2/2n2n4/4Nodes(Tbps)(Tbps)(Tbps)(Tbps)8 × 8281612810 × 102.512.525312.512 × 1231836648
The increases required for biBW going from 2D to 3D are shown in Table 1 for three different 2D n×n meshes. Also shown in Table 1 are the increases required for biBW when additional vertical links are required per node (referred to as 3D enhanced). The increase in biBW for 2D, 3D, and 3D-enhanced systems as a function of the number of cores is shown in FIG. 4. From the plot in FIG. 4, it is clear that the biBW increases rapidly both as a function of cores and, especially, as a function of NoC topology (i.e. links/node).
Accordingly, the vertical bandwidth required for TSVs in a 3D IC is a strong function of the link topology and the chip architecture. Therefore, the range of average vertical BW supported by TSV-based interconnects in a 3D IC must be revised to reflect this. This is illustrated in FIG. 5. For the simplest stacked 2D NoC mesh (3D biBW), the average vertical BW requirement ranges from 8-18 Tbps. If a NoC topology is required that introduces only a single additional link/node (“Enhanced one more link”), the average vertical BW requirement doubles to 16-36 Tbps. If a maximal NoC link topology is used ((n2/2)−1 links per core “Max enhanced”) the vertical BW requirement jumps to 128-648 Tbps.
Conventional implementation of TSVs is insufficient to meet this vertical BW need at an acceptable energy per bit. As an example, for a given 3D IC footprint, each TSV has roughly the same geometry (e.g., size, aspect ratio) due to processing constraints, and each TSV (or TSV-array) has the same frequency response for signal transmission. The number of TSVs per unit area on the 3D IC is fixed at roughly 10% of the die area. In addition, only a fraction of TSVs are available for communications (˜10-20%). However, real-time vertical communication BW in many-core 3D ICs, particularly for the case of NoC architectures, changes according to workload. Thus, to satisfy peak BW demand, excess TSV capacity is maintained, which will exceed TSV area allowances. Consequently, a many-core 3D IC based on TSVs with the same signal transmission speed cannot simultaneously satisfy the high BW requirements and area constraints described above.
For example, a 1S4G TSV structure may be used to meet the average BW requirements of a given 3D IC. In particular, the 1S4G is a 5-TSV link (1 signal TSV and 4 ground TSVs) that enables a data rate (BW) of 8 Gbps. When the TSV structure is fabricated with a 5 μm diameter for each TSV and a 10 μm distance from the centers of adjacent TSVs, an area requirement of the 1S4G is 200-300 μm2/TSV. The 1S4G TSV link is used because it tends to be superior to isolated signal TSVs (referred to as 1S TSVs), which have data rates ˜1 Gbps and area requirements of 100 μm2/TSV.
For a given 3D IC, 12500 1S4G TSVs are needed to achieve an average vertical BW of 100 Tbps compared to 100,000 1S TSVs. 625 1S4G TSVs are needed to achieve an average vertical BW of 5 Tbps compared to 5000 1S TSVs. And, 250 1S4G TSVs are needed to achieve an average vertical BW of 2 Tbps compared to 2000 1S TSVs. For each scenario the 1S4G approach requires only 38% of the area that would be utilized for the 1S approach.
Presently, existing 3D ICs have die areas between 1 mm2 and 20 mm2. Considering a case where the total in-plane die area is 20 mm2, the 1S4G TSV links can satisfy estimated vertical BW requirements based on a simple 3D biBW model (8-18 Tbps). In particular, given that the number of TSVs per unit area on the 3D IC is fixed at roughly 10% of the die area and only about 10% of the TSVs are available for communications, 1000 1S4G TSV links can be used to satisfy the vertical BW requirements (20 mm2×10%×(10%/200 μm2)=1000) by providing 8 Tbps.
However, for the case of the 3D-enhanced NoC topology (16-36 Tbps), the 1S4G TSV links, while capable of satisfying the data rate requirements, violate the area constraints. Moreover, for the max-enhanced NoC topology (128-648 Tbps), the 1S4G TSV links cannot satisfy either the data rate or area requirements.
Accordingly, there exists a need in the art for appropriate interconnect structures that can satisfy vertical link performance, power, and area requirements.