The present invention relates to the areas of digital circuits and digital logic. In particular, the present invention provides a method and device for high performance SRAM (xe2x80x9cStatic Random Access Memoryxe2x80x9d) offering significantly higher read speed for the same cell area and technology relative to standard techniques.
Memory access speeds present a significant bottleneck in computer system performance. Modern computer systems typically utilize a hierarchical cached architecture in order to improve performance. In this hierarchy, fast but more expensive memory SRAM is located close to the CPU providing a cache for data and instructions while main memory is constructed from DRAM (xe2x80x9cDynamic Random Access Memoryxe2x80x9d), which provides density, but is typically much slower than SRAM. SRAMs are integrated circuits that are memory arrays with access ports that allow reading or writing. SRAMs have a fixed access time to any datum, though the read and write access characteristics often differ.
Even when instructions and data are cached using faster SRAM arrays, memory access instructions present a bottleneck. Depending upon the sequence of instructions, these bottlenecks may result in pipeline stalls and will generally significantly degrade performance especially as modern applications rely on frequent memory access.
SRAM cell power delay area product has not scaled commensurate to that for logic. The SRAM non-scaling problem presents a significant issue to the SRAM designer. Over the last few generations, the multiple effects of scaling of device horizontal and vertical dimensions and the associated adjustments of the device engineering and power-supply levels has resulted in faster devices, but not in higher saturation currents. SRAM speed depends on small-signal slew-rate (i.e., the saturation current of the device with respect to bitline capacitance). Bitline capacitances have benefited from scaling but, in particular the wire capacitance component (generally at least half of the bitline capacitance load) has not scaled at the same rate as logic speeds. The voltage that must be developed on the bitlines also has not scaled at the same rate as logic and these requisite voltages are generally not scaling or are in fact increasing in present and future technologies due to tolerances and parameter matchings, which tend not to scale.
The non-scaling problem has been addressed via architectural work-arounds for cache and register macros. However the extra logic and bypass paths needed to support these architectures are expensive. Adding increasing layers of logic becomes an intractable problem. Another attempt to deal with the non-scaling problem has been to reduce the number of cells per bitline pair. However, this approach is also very expensive in terms of increased area and power for the peripheral circuitry and exploitation of this strategy has reached practical limits. Other strategies for dealing with the non-scaling problem are also reaching their limits such as providing increased sense-amplifier performance (through increasingly complex sense-amplifier design), local amplification and aggressive clocking and pipelining methods.
In addition, leakage and other power concerns are becoming a severe problem due to the large increase of on-chip cache size that is being predicted for future high performance microprocessors. There is reason to anticipate that the SRAM non-scaling problem will continue to plague SRAM design for the foreseeable future. Thus, new techniques for addressing the non-scaling problem are necessary.