The present invention relates to computer and processor structure and architecture and, more specifically, to a high memory density, high input/output (I/O) bandwidth logic-memory architecture.
Logic-memory devices employing a 4D integration (4DI) structure provide for a large number of memory cells to be located in close proximity to logic cores, thereby reducing signal delays, achieving optimum logic-memory arrangement, and improving performance. There is a long history in microelectronics to strive for better logic-memory architecture to improve system performance by reducing so called logic-memory bottle-neck. Logic-memory bottleneck arises from either slow or insufficient memory to keep up with the faster logic, leaving the logic with long periods of time idling for data. This is particularly a problem for high-end systems in which large numbers of chips are devoted to high counts of multi-core logic while demanding even more space for memory at close proximity. Prior to 4DI implementations, multi-chip modules (MCM), precision aligned macros (PAM), and 3D integration (3DI) structures were widely used in logic-memory devices to improve logic-memory delays. MCM uses logic and memory as separate chips but mounted on the same chip carrier. The chip-to-chip communications are through conventional flip chip connections and wiring disposed within or atop the chip carrier. PAM on the other hand accurately aligns the logic and memory chips onto a carrier wafer and then adds fine pitch back-end-of-the-line (BEOL) wiring across the chips to make the connection. PAM allows higher input/output (I/O) than MCM between the logic-memory chips. There are two versions of 3DI which provide improvements over MCM and PAM. 3DI-stacking stacks the memory into a nearly cube form. There are no direct connections between the memories. Instead, all memory leads are wired to the chip edges and then wire bonded to a logic chip. In this format, the amount of memory in the stack in a given silicon foot print (also referred to as memory density) can be very high but the I/O density is low. 3DI-TSV (through-Si-via) architecture involves stacking a number of memory layers onto a logic unit whereby the memory stack is disposed on the logic unit in a parallel formation with the logic unit and the TSV connects between the chip layers. 3DI-TSV architecture was considered to provide benefits over traditional 2D planar devices in that more device memory layers were enabled through the 3DI architecture with a very high I/O bandwidth. The amount of memory stackable in the 3DI approach is, however, less than what is possible in the memory cube configuration due to the silicon area required for the TSV connections as well as the complexity of layer to layer connections encountered with increased memory layer counts.
A 4DI structure, which is a combination of 3DI-stacking (with high memory density) and 3DI-TSV (with high I/O density), enables large memory density in close proximity of a large number of logic cores in a super-performance computing architecture. The 4DI logic-memory arrangement includes vertically arranging memory slices of a memory stack below a logic unit whereby the vertical arrangement of memory slices are perpendicular to the logic unit, thereby enabling a greater number of memory devices to reside in the combined structure. This 4DI logic-memory arrangement also provides both high bandwidth and high memory density in very close proximity to the processor cores with a much reduced process complexity.
In high performance 4DI systems, however, a single logic core requires hundreds of I/O connections and there are hundreds or thousands of cores per logic chip. These large numbers of logic I/Os need to be wired and connected to each of their memory stacks through fine pitch area array type connections, such as a transfer-join (TJ) connection or a micro-C4 (uC4) connection. As the 4DI memory stacks are assembled from dozens of individual memory wafers, oftentimes there is misalignment and/or distortions that occur such that the position of device elements becomes skewed relative to their connective counterparts.
In another implementation, each of the 4DI slices in the vertical stack contains both logic cores and memory as in a conventional 2D architecture. The top horizontal chip contains logic cores and/or crossbar switching elements which direct and collect data to and from the vertical slices. The top logic/crossbar chip is provided with fine pitch (through TJ or uC4) arrays which are connected to the vertical memory/logic slices. Again, oftentimes there is misalignment and/or distortions that occur in the 4DI stack such that the position of device elements becomes skewed relative to their connective counterparts.