Driving bandwidth off a processor chip to L3 cache with low latency presents challenges, especially as processor technology continues to scale to future generations. In a few years a processor will need serial data rates of many 10 Gbps over many channels of communication to cache. The latency for this communication will continue to be impeded by the inability for electrical signaling to drive off chip without penalty. For example today's 10 Gbps serial links drive off chip to cache at a cost of several nanoseconds or several hundreds of processor clock cycles This can limit the ability of processors to utilize advancement in computational power, clock frequency, transistor count, multi-coring and so forth.
Proximity communication as a means to signal between processor and memory could alleviate this issue.
As noted in the Sun Microsystems document published Sep. 20, 2004 with web reference http://research.sun.com/spotlight/2004-09-20.feature-proximity.html, on the printed circuit boards of many computers, information and electrical power often travel over copper wires between CPUs (Central Processing Units), memory and I/O (Input/Output) devices. The copper wires connect the devices using technologies such as pins, ball bonding and solder bumps which involve macroscopic conductors that are massive in size compared to the submicron features on the chip itself.
In “proximity communication” data is conveyed between chips via capacitive couplings. Because this communication between chips does not rely on wired or conductive connections, the number of connections between chips can be much higher than with ball bonds (about 100 times greater, for example). The chips can talk at much higher speeds with lower latency and significantly less energy than using wires.
To form the capacitive couplings, microscopic metal pads are constructed out of standard top-layer metal structures during chip fabrication. These pads are then sealed with the rest of the chip components under a micron-thin layer of insulator to protect the chip from static electricity. Two chips, with receiver and transmitter pads, are then placed facing each other such that the pads are only a few microns apart. Each transmitter-receiver pad pair forms a capacitor, and voltage changes on the transmitter pad cause voltage changes on the receiver pad despite the lack of a conductive (e.g., wired) connection. This is akin to the physical effect that causes touch lamps to light when a human touches the conductive base of the lamp. Another analogy is the synaptic connection of biological nervous systems, where signals jump from one neuron to another.
Actual details can be more complex, including for example chip logic for driving and amplifying the signals, and the receiver circuit must tolerate far more variation than a conductive connection. The voltages involved can vary widely, so Proximity communication technology is often engineered to work over about a factor of ten voltage variation. Because mechanical misalignment can and will occur, it is desirable to compensate dynamically for effects such as vibration and unequal thermal expansion, and provide mechanisms that permit large voltage tolerance and dynamic reconfigure to overcome misalignment, so that Proximity communication may continue to function.
Proximity communication can provide an order-of-magnitude improvement in each of several dimensions: density, cost, speed, latency, and power demand. Because Proximity communication reduces the space taken up by the communication path, the power and the cost per bit transmitted, it can be possible to get tens of terabytes per second in and out of a single VLSI chip. Technologies in 2004 were limited to a few hundred gigabytes per second. With all dimensions taken into consideration, Proximity communication promises to improve overall capability as much as two orders-of-magnitude.
Proximity communication also permits “Wafer scale integration”. Instead of trying to make processor chips ever larger, with resulting lower and lower yields, Proximity communication can allow one to lay out a “checkerboard” of chips that all behave as a single integrated circuit. Wafer scale integration has historically failed because the yield (e.g., known good die) drops to zero as the silicon area of a chip increases. With Proximity communication, one can get the same performance advantage as wafer scale integration but with excellent yield. When a flaw in a chip is discovered, Proximity communication allows one to simply lift out the chip and drop in a new one (clearly with some level of clean room conditions). This can be very expensive or impossible with prior art methods that connect chips to multi-chip modules, and force replacement of the entire circuit module instead of just the defective part.
Proximity communication also promises increased technologic versatility, so that different technologies can be mixed. “Processor in Memory” has been talked about as a way to put a complete computer system on a single chip, but the process technologies used to build CPUs are very different from the process technologies that are optimal for building dense memory like DRAM. Because Proximity communication lets each part be manufactured separately but then integrated using Proximity communication as the universal interface, the constraint of using a single manufacturing technology vanishes. It is even possible to mix, say, gallium arsenide and silicon chips in a single array. This is made possible by the fact that Proximity communication is inherently tolerant of different voltage levels needed for different semiconductor materials, and also by the fact that Sun Microsystems, Inc.'s approach includes asynchronous logic to remove the need for a common clock between two circuit chips.
Proximity communications also promises dramatic cost savings. Sockets, pins, and circuit boards add cost to a system, but Proximity communication eliminates them. And with Proximity communication, chips can be smaller than they are now, thereby increasing yield and decreasing the cost of each component chip.
Although proximity communication as a means to signal between processor and memory could alleviate the issue or challenge of driving bandwidth off a processor chip to L3 cache with low latency, a problem is that proximity signaling near the processor cannot be packaged in a low cost solution using industry standard packaging platforms. This is because proximity signaling needs tolerances much smaller than 10 microns to accomplish high fidelity signaling whereas manufacturing cannot achieve tolerance of better than a few mils or thousandths of an inch at low cost.
One of the issues with packaging proximity commutation is overcoming the flatness tolerances of first-level packages. Also there is significant interest in reworking multi-chip packages for the commercial market for processors. There are additional manufacturability issues associated with the “known good die” problem. Solutions such as Multichip Modules (MCM) have only a limited impact in packaging technology today owing to the known good die problem and the inability to significantly test chips until parts are packaged into the full assembly. Many of these problems, when applied to the processor cache unit, can be overcome by example embodiments of the invention described herein.