Field
Aspects of the present innovations relate to or involve output latching and/or memory operation, such as pipelined output latching circuitry/schemes for high transaction rate synchronous memory.
Description of Related Information
A requirement of certain systems and environments such as current networking equipment is the use of high speed memory that accepts address input at high frequency, while allowing a slower access time in comparison. For example, a conventional SRAM running at 714 MHz with two addresses accepts one read address and one write address on every clock cycle. However, the read output may be required only on the third clock, and is referred to as a read latency of RL=3. Once the RL is satisfied on the first access, continuous output occurs for subsequent cycles in response to address inputs. The conventional memory may also employ a double data rate (DDR) data scheme, such that every clock cycle will have two pieces of data, with one aligned on the positive clock edge and the other aligned on the negative clock edge. Higher clock frequency provides a faster address rate, thereby allowing RL to be increased in value.
A conventional design is shown in FIG. 1 where the output path includes two data paths. The two data paths receive the data B1 and B2 from sense amplifier SA at the same time to shift the data to the output. An address request is received on every external clock cycle. Therefore, the sense amplifier SA produces B1 and B2 data every clock cycle in response to the external address. The output Q generated from B1 and B2 data can be produced at any of 1, 1.5, 2, 2.5 or 3 clock cycles later, according to the read latency RL, in response to the external address. However, the data needs to be changed every clock cycle. The output Q is provided in DDR format, with the first half cycle including data B1 and the second half cycle including data B2. Data B1 is shifted through a register Reg clocked by clock K and then by clock Kb if RL is 2.5 or directly to the final latch if the RL is 2.0 or lower.
With regard to some of the signals illustrated in FIG. 1, KDS and complementary KDS (/KDS) are data strobe signals generated by K and Kb, respectively. RE is the read enable signal that drives the sense amplifier output during certain times. CKout1 is a pulse signal to the output clock buffer in the first data path to enable B1 outputs, and CKout2 is a pulse signal to the output clock buffer in the second data path to enable B2 outputs. SEL is the select signal that is set low for SQ1, SQ2 and SQ2+(read latency of 2 clocks), and high for SQ2+(read latency of 2.5 clocks).
The clock K is generated from an external clock CLK and clock Kb is the inverse of clock K. Data B2 of output Q is half cycle later than B1, so one additional register Reg that is clocked by the next half clock is needed to account for the half cycle shift. The final stage is formed by a pass gate for each B1 and B2 data path, and is clocked by DLL (Delayed Lock Loop) or PLL (Phase Locked Loop) clocks and then wired “OR” with a latch to be multiplexed to the output Q. The DLL or PLL clocks CKout1 and CKout2 are generated by a DLL or a PLL circuit to align the output Q to the external clock's high and low edges. For example, when RL=2.0, CKout1 aligns data B1 to clock CLK on the second CLK high edge after the address is received; and CKout2 aligns the data B2 a half clock later on the second CLK low edge. As RL is increased, the number of registers provided in series increases in the read data path and the registers are strobed by the clock edges.
Such conventional schemes have several drawbacks, however. First, the memory access cycle time is limited by the clock K in the first output register Reg. The added read latency does not improve the clock frequency. Second, if the clock K in the first output register is delayed to improve the clock frequency, then clock Kb of the second register also needs to be delayed. Consequently, the delay of Kb can delay output Q and the delay itself is difficult to optimize. Third, any additional series register(s) undesirably increases the overall access delay.
In sum, there is a need for systems and methods that provide higher transaction rate synchronous memory, utilize less registers and delay in the data paths, and/or otherwise overcome existing drawbacks such as clock delays in output register chains as well as limitations regarding access delay time and/or memory access cycle time.