A dual-port static random access memory (SRAM) cell (bitcell) requires at least eight transistors. In contrast, a traditional single-port SRAM bitcell requires only six transistors. As compared to a single-port SRAM cell, a dual-port SRAM cell thus requires two extra access transistors to accommodate the additional port such that single-port SRAMs are substantially more dense than dual-port SRAMs. To maintain the density advantage of single-port SRAMs, “pseudo-dual-port” (PDP) SRAMs have been developed in which the single port of traditional SRAM is time multiplexed to represent separate read and write ports.
Although pseudo-dual-port SRAM has higher density, this improved density comes at the cost of slower operation in that a single clock cycle must accommodate two access cycles (both a read operation and a write operation) to simulate the two ports of actual dual-port SRAM. The resulting multiplexing of the single access port places timing demands on the PDP memory operation that may be better appreciated though a consideration of the PDP waveforms of FIG. 1. A clock signal (clk) pulses high to trigger a read and write operation. To complete the read operation, a word line (WL) is pulsed high. Prior to the pulsing of the word line for the read operation, the bit lines are precharged as controlled by a global active-high precharge signal (bl_pre_global). This global precharge signal is released prior to the pulsing of the word line for the read operation. At approximately the same time, an active-low read multiplexer (readmux) signal is asserted low so that the appropriate bit lines are coupled to the sense amplifier (not illustrated) during the pulsing of the word line for the read operation.
With the read operation completed, the word line is released followed by the release of the read multiplexer signal. The global precharge signal is then pulsed high to the power supply voltage to begin the precharge of the bit lines for the write operation. After the write precharge is completed, the global precharge signal is released, whereupon the write multiplexer (writemux) signal is asserted to couple the appropriate bit lines to the write driver. The word line can then be pulsed high for the write operation. During the write operation, one of the bit lines to the memory-cell-being-written-to is discharged to ground as indicated by the “BI discharge” designation for the pulsing of the write multiplexer signal. With the write operation completed, both the word line and the write multiplexer signal are released.
The separation (Rd/Wr isolation) between the pulsing of the word line for the read operation and the write operation defines the maximum amount of time that can be taken for the write operation precharge. But since the global precharge signal is the last signal to be asserted following the completion of the read operation and the first signal to be released prior to the pulsing of the word line for the write operation, the pulse width for the pulsing of the global precharge signal for the write operation is narrower than the separation between the pulsing of the word line for the read and write operations. But achieving a narrow pulse width for the global precharge signal is problematic in that the global precharge signal (being a global signal) is carried on a lead or trace that extends across all the bit lines in the array and thus has a substantial parasitic resistance and capacitance (RC) load. This RC loading produces an RC delay when the global precharge signal is pulsed high (producing a rising edge) and produces yet another RC delay when the global precharge signal is released (producing a falling edge). There is thus a 2*RC delay that must be completed during the pulsing of the global precharge signal.
One of the RC delays is modeled by a precharge tracking circuit 200 as shown in FIG. 2. A clock generator 205 responds to the clock signal (clk) by asserting an internal clock signal (iclk) that is received by a signal generator 220. During a read operation, signal generator 220 receives an asserted read signal. Signal generator 220 responds to the assertion of iclk (in conjunction with the assertion of read signal) by asserting the global precharge signal (bl_pre_global) prior to the word line assertion for the read operation. During the word line assertion for the read operation, signal generator 220 also asserts the active-low read multiplexer (readmux) signal by discharging it to ground. Following the release of the word line for the read operation and the read multiplexer signal, clock generator 205 again pulses the internal clock signal high to trigger signal generator 220 to pulse the global precharge signal high to begin the bit line precharge for the write operation. To track one of the RC delays required to pulse the global precharge signal, clock generator 205 charges a dummy word line (dwl) in conjunction with asserting the internal clock signal. The dummy word line has the same electrical characteristics (RC delay) as the actual word line (not illustrated). The charging of the dummy word line then triggers an inverter 215 after the dummy word line charges according to its RC charging delay.
The triggering of inverter 215 causes it to discharge a dummy bit line (Dummy BL load) that has the same electrical characteristics (RC delay) as the actual bit line. The delay through the discharge of the dummy bit line thus models the actual delay required to charge the global precharge signal high and to charge the bit line. But the RC delay is particularly enhanced at the high voltage process corners. So to ensure a sufficient pulse width for the high voltage process corners, a fixed delay circuit 210 adds another delay on top of the RC delay for the dummy bit line assertion and the RC delay for the bit line discharge After the desired delay, fixed delay circuit 210 discharges (or asserts) a reset signal to trigger clock generator 205 to de-assert the internal clock. In response, signal generator 220 discharges the global precharge signal, which requires another RC delay.
The global precharge signal drives an inverter 225 to form an active-low local precharge signal (pre_n) that drives a gate of a PMOS transistor P1 having its drain tied to the bit line (bl) and its source tied to a memory power supply node providing the memory power supply voltage VDD. Thus, once the global precharge signal is charged high, transistor P1 switches on to precharge the bit line to VDD. But despite the modeling performed by tracking circuit 200, it is difficult to ensure that the sufficient pulse width is maintained for all process corners (process, voltage, and temperature). For example, to ensure signal integrity at the high voltage process corners, the modeled delay must be longer than necessary at the low voltage process corners, which lowers the memory operating speed. In addition, the signal loading for the global precharge signal trace that carries the global precharge signal across the columns may be quite different from the word line loading such that the power margin closure for the precharge tracking becomes challenging. In that regard, increasing the number of columns (increasing the array width) increases these precharge tracking problems. This is particularly problematic in that such wide memories are very area efficient as compared to eight-transistor (8T) dual-port register files. It would thus be advantageous to replace such 8T dual-port register files with a corresponding PDP memory except for the performance issues with regard to ensuring a sufficient pulse width for the global precharge signal across all the process corners.
Accordingly, there is a need in the art for improved precharge schemes for PDP memories.