1. Field of the Invention
The present invention relates to effective circuit, system designs, and methods of a self-timed SRAM for high speed and low power applications.
2. Description of the Related Art
Today's SRAM is commonly used as caches in ultra-high clock-rate CPU. A clock rate of 1 GHz CPU can be achieved easily in today's nanometer devices, such as 65 nm CMOS and beyond. For an SRAM to match such high clock rate, the access time needs to be less than 1 ns. In such short period of time, it would be very hard to generate control signals to keep propagation delay short while keeping inactive block idle to meet both high speed and low power. It is the objective of this invention to achieve high speed, low power, and small area to meet the very demanding SRAM requirements in today's applications.
A conventional SRAM memory cell is shown in FIG. 1. The cell 10 consists of a cross-coupled latch constructed from inverters 11 and 12, two pass transistors 13 and 14. The input of the inverter 11 NB is coupled to the output of the inverter 12, and the input of the invert 12 N is coupled to the output of the inverter 11. The sources of MOS 13 and 14 are coupled to the two nodes N and NB, respectively. The MOS 13 and 14 have their sources coupled to nodes N and NB, their gates coupled to a wordline (WL), their drains coupled to bitline (BL) and bitline bar (BLB), respectively. The SRAM cell can be organized as a two-dimensional array with all BLs and BLBs of the cells in the same columns coupled together in vertical direction and all wordlines of the cells in the same row coupled together in horizontal direction. The SRAM cell can be read by pre-charging the BL and BLB to a high voltage (i.e. VDD) and then let floating. The BL or BLB voltage can be pulled low by the inverter 11 or 12 in the cell, depending if the data stored is 0 or 1, when the WL is turned on. Similarly, the SRAM cell can be written with data 0 or 1, by pulling BL or BLB low, respectively, when the WL is turned on.
A conventional sense amplifier (SA) of an SRAM is shown in FIG. 2. The latch-type SA 20 has a structure very similar to the SRAM cell shown in FIG. 1. The SA is based on the positive feedback of a latch to amplifier the input signals so that the gain can be high and the power consumption can be low. The SA can be activated when the input signals reach sufficient differential voltages; otherwise incorrect data may be sensed instead. The SA 20 has PMOS 21-1 and NMOS 21-2 constructed as an inverter with their gates coupled to QB, their sources coupled to VDD and ground through a NMOS 25, respectively. The drain of PMOS 21-1 is coupled to the drain of NMOS 21-2 and to a node Q. The SA 20 also has PMOS 22-1 and NMOS 22-2 constructed as an inverter with their gates coupled to Q, their sources coupled to VDD and ground through the NMOS 25, respectively. The drain of PMOS 22-1 is coupled to the drain of NMOS 22-2 and to a node QB. The sources of NMOS 21-2 and 22-2 are coupled to the drain of the same NMOS 25, whose gate is coupled to an SA enable (SE) and whose source is coupled to ground. The nodes Q and QB are coupled to the sources of PMOS 23 and 24, respectively. The drains of PMOS 23 and 24 are coupled to DI and DIB, respectively. The gates of PMOS 23 and 24 are coupled to an SA input enable (SIB). Two PMOS 26-1 and 26-2 are pullup devices for nodes QB and Q, respectively, with the gates coupled to SE.
The SA shown in FIG. 2 works as follows. Before sensing, SE is low and the nodes Q and QB are pulled high to VDD by a pair of PMOS pull-ups 26-2 and 26-1, respectively. When the signals at DI and DIB reach sufficient voltage difference (i.e. 100 mV split), SIB can be pulled low to allow signals coming into the nodes Q and QB, respectively. After the differential voltages are passed from DI/DIB to Q/QB, SE can be turned on to pull the drain of MOS 25 low while disabling PMOS 26-1 and 26-2. At the mean time, the SIB can be set to low to turned off the PMOS 23 and 24 to isolate the input signals from the internal nodes Q and QB in the SA. Subsequently, the cross-coupled latch consisting of PMOS 21-1 and 22-1 can be activated to make Q and QB split wider and reach rail-to-rail levels eventually. The cross-coupled latch consists of NMOS 21-2 and 22-2 can also be activated when the NMOS 25 is more heavily turned on. The timing of turning on SE is very crucial only after substantial signal splits are developed at nodes Q and QB, otherwise wrong data can be sensed and latched instead.
FIG. 3(a) shows a block diagram 30 of a portion of a self-timed circuit in a prior art. A memory cell array 31 has a reference column 32 that consists of 3 reference cells 33-1, 33-2, and 33-3, and some dummy cells 35 to fill up a column. The reference cells can be modified slightly from the normal cells to make reference BL pulled low earlier and then used as a control signal to trigger at least one SA.
FIG. 3(b) shows a schematic of a reference cell 40, corresponding to the reference cells 33-1 through 33-3 in FIG. 3(a), in a prior art. The reference cell 40 has a pair of cross-coupled inverters 41 and 42 constructed as a latch, and two pass transistors 43 and 44, similar to a normal SRAM cell. However, the gates of the pass transistor 43 and 44 are coupled to BLin (equivalent to BL in the normal cell), and the drain of the NMOS 43 is coupled to high (i.e. VDD). When BLin is set high, RBL can be pulled low. If the three reference cell 33-1, 33-2, and 33-3 in FIG. 3(a) have their BLin coupled together and have one shared RBL, RBL can be pulled down faster than any normal cells pulling down BL/BLB. As a result, RBL can be used to trigger an SA. BLin signal can be generated from a control signal, such as a clock CLK, through a multi-tap delay chain 45. Turning on BLin with respect to CLK can be adjusted by setting a plurality of delay control signals or a plurality of registers.
To achieve a reliable SRAM function with low power consumption, it is desirable to turn on the selected WL while turning off BL pullups only during the memory cells are ready for access. FIG. 4(a) shows a portion of a schematic 50 of a self-timed circuit in a prior art. The schematic 50 has a plurality of address buffers 57, pre-decoders 58, and decoders 59. The address buffer 57 consists of inverters 51-1, 51-2 and 51-3 to generate true and complement addresses, respectively. The pre-decoder 58 has a multi-input NAND 52-1 and followed by a buffer 52-2. The decoder 59 has a multi-input NAND 53-1 and followed by a wordline driver 53-2 to drive a WL. The addresses generated from the address buffers 57 are then input to a plurality of pre-decoders 58 to generate more pre-decoded signals, which are then input to a plurality of decoders 59 to fully decode the available address space. A WL can be turned on by using a wordline enable (WLEN) in one of the input in one of the pre-decoders. WLEN can be generated from a control signal, such as a clock CLK, through a multi-tap delay line 54. The delay line 54 can be controlled by a plurality of delay control signals or a plurality of registers.
The relative timing between CLK and WLEN is shown in FIG. 4(b). There is a parameter Twld to specify the turning on of WLEN after CLK transitions to deliver a new address. If a WL is turned on too early, excess power may be wasted. If a WL is turned on too late, the speed may be penalized. Another parameter Twlp is related to the WL pulse width. If Twlp is too narrow, sufficient BL/BLB split may not be developed before the WL is turned off again. If Twlp is too wide, excess power may be wasted. These two parameters can be used to illustrate the concept of tradeoff between speed and power.
The self-timed circuit as shown in FIGS. 3(a), 3(b), 4(a), and 4(b) are not self-timed precisely. The turning on of RBL is not tracking with WL turn-on time. The turning off of a WL is not tracking with SA enabling or disabling either. As a result, RBL may turn on an SA when the input signals are not split wide enough, or even before the selected WL is turned on at the worse. The selected WL may be turned off prematurely before sufficient signal splits reaching the inputs of an SA and sensed. Consequently, wide timing margins are needed to build a robust SRAM; otherwise yield and reliable operations may suffer. Therefore, it is still very desirable to invent some precise self-timed circuits to reduce timing margins to meet today's very demanding SRAM requirements.