1. Field of the Invention
The present invention relates to a high-performance semiconductor memory architecture and, in particular, to a high performance architecture of a Static Random Access Memory (SRAM) cell.
2. Description of the Related Art
In these years, the operation speed and the integration density of the integrated circuit have been significantly improved, along with the miniaturization of the semiconductor elements. Particularly, BiCMOS LSI which employs the combination of CMOS FETs and bipolar transistors and enabling high speed operation and low power dissipation, is being developed.
This trend is ascribed to the recent tendency of high speed and high function of electronics instruments.
Miniaturization of semiconductor elements has been advanced to respond to this requirement. Fine pattern processings, however, need corresponding facilities and equipments. It is, hence, not easy to rapidly develop the fine pattern processings. Thus, attempts have been performed to enhance the high speed operation by the expedients in the circuit design. For example, high speed memory circuits unitizing pipeline systems were proposed and fabricated. The pipeline system performs reading and writing of information at a shorter time interval than the information read time of the memory circuit (address access time: time from the input of an information read signal to the output of a recorded information from a memory cell, hereinafter referred to as access time), by dividing the operation of the memory circuit along the signal flow and operating the respective circuits independently.
Later, an improved pipeline technique known as a xe2x80x9cwave-pipeline techniquexe2x80x9d was proposed for further enhancing the operational frequency of the conventional pipeline technique. In the wave-pipeline technique, a plurality of signals are propagated on the data path as-wave signals. With this technique, an operation which is equivalent to a conventional two-stage pipeline technique is realized without interference and with a reduction in the power dissipation and the chip area.
In the wave-pipeline technique, the operational speed of the system is improved without using intermediate registers or latch circuits. That is, a plurality of coherent data waves are aligned in sequence in the combination circuit by feeding clock signal to flip-flops at a rate higher than the propagation delay time of the combinational circuit. That is, if all the signal paths for signal components of a wave signal extending from the input to the output of the combinational circuit have a substantially equal delay, the individual wave signals can be propagated toward the output section without an interference between the wave signals.
If address signals are applied to a data path with a cycle time which exceeds an access time, read-out data is not output during the self-delay of the memory core. In the memory system of the wave-pipeline technique, address input signals are applied with a period which is less than the critical path of a memory core section.
A key to implementing the wave-pipeline technique on the semiconductor memory system lies in reducing difference in the signal delay time which is caused by different locations of memory cells being accessed or that caused by a difference between data path lengths.
It is to be noted that in a memory system of a large capacity, a refinement in the process results in a reduced metal film thickness, and a reduction in size of a memory cell also results in a reduced metal line width.
As the capacity increases, there is a tendency that signal wiring and bit lines, which use metal, increase. This means that a resistance presented by the signal wiring and the bit line increases, posing a problem in that a signal delay time caused by the signal wiring and the bit lines increases as does a difference in the signal delay time caused by differences of data paths.
Static Random Access Memory (SRAM) devices are comprised of a rectangular matrix of memory cells. Individual memory cells are accessed by the intersection of decoded row and column addresses. Because the SRAM receives these addresses in only one input location, it follows that some memory elements are close to the input while others are farther away from the input. Or, in terms that are important to high performance memories, there are fast memory elements and slow memory elements within the SRAM. Normally, this is not an issue because the speed of the SRAM is dictated by the slowest cells, and the fastest cells meet the specifications with margin. As SRAM densities and performance increase, the speed difference between the fast and slow elements can become a significant percentage of the cycle time of the memories and start to impact performance. This will become evident by reviewing the operation of a conventional SRAM and considering the differences between an access to a slow memory element and an access to a fast memory element in a typical prior art SRAM design shown in FIG. 1.
FIG. 1 illustrates a conventional quadrant of a typical 16 Mb SRAM. Although this design has been optimized for performance, the shaded blocks, labeled SUB0UL (upper-left) through SUB15LR (lower-right) represent 64 of the SRAM""s 256 subarrays. Each subarray is a small independent memory structure, which contains all the sensing, precharge, and timing circuitry to access the contained memory elements. SRAM designs utilize the subarray structure to minimize the number of cells activated within any given cycle, thereby reducing the chips active power. For this design, only 2 of the 64 quadrant subarrays will be active in a cycle. The subarrays are designed using the standard dummy wordline and dummy bit technique as more fully described in U.S. Pat. Nos. 5,268,869, and 4,425,633 which are examples of this technique used for the past 15 years to precisely time the sensing circuitry. One benefit derived from this sensing method is an almost constant access time across the subarray, leaving only the subarray selection and common global data buses between subarrays that can generate an access delta between memory elements. In this case, we will compare an access delta between a slow subarray 11 and a fast subarray 19.
For the existing architecture, 11 is accessed by an address signal 1 that drives from the center of the chip through three sections of wire 2, 3 and 4 having a delay RC1 and the two re-buffers 5 and 6, before reaching the global wordline driver 7. The global wordline driver circuits decode the address and drive a global wordline signal across the array on a wire 9 with delay RC2. Due to the large array size, the global wordline is applied to rebuffer 10 before driving to 11 across another wire 12 having a delay RC12. Note that for simplicity this diagram only illustrates the selection of the subarray through the global wordline. In reality, the global wordline selects the subarray in conjunction with several column selection signals. It should be clear that the wiring and buffering of the column signal will be handled in a manner similar to the global wordline. Once selected, subarray 11 accesses its local memory cells and then drives data along a data bus 13 with delay RC3 through a data rebuffer circuit 14, along a second data bus 15 with delay RC4, through a second data rebuffer circuit 16, and finally a third data bus 17 having a delay RC5 before reaching the SRAM output drivers 18.
The fast subarray 19 is selected similarly, except in this case the addresses only need to travel through one section of wire of delay RC1 and the two rebuffers 5 and 8 before reaching the global wordline driver circuit 20. The global wordline drives through wire 21 with delay RC2 and selects subarray 19 without going through the global wordline rebuffer 23. Following the access to the subarray""s local memory elements the data drives directly into the first data rebuffer circuit 22 and subsequently the second data rebuffer 16 without having additional wire delays. After the second stage rebuffer, the data travels along the data path of delay RC5 before reaching the output drivers 18.
To get a better appreciation for the timing differences between the fast and slow subarrays the following Table I translates the various delays discussed above to specific values, based on a typical 16 Mb SRAM design parameters.
As shown in Table I, the total difference between accessing a fast and a slow subarray is 471 ps, or almost 25% of the products 2ns cycle. This timing difference (access delta) limits performance and complicates the design of a high performance SRAM device.
Accordingly, it is an object of the present invention to provide an architecture which will minimize the delays within each cell thereby allowing the cycle time to be reduced by preventing fast subarray accesses from colliding with the slower data from the more remote subarrays.
In accordance with the present invention, the architecture of the array is laid out to equalize access to all memory elements. To the greatest extent possible, the cells are located around the periphery as if on a rim of a wheel. With such an arrangement, the address signal is fed through the center of the array and propagates radially to the selected subarray. The data from the subarray will then follow a radial path back through the center of the array to the output drivers. In this way, the delays to the fastest and slowest subarray would have an access delta that is about the same.