The present invention relates in general to integrated circuits that process data in a pipelined fashion, and in particular to an improved data transmission circuit that enhances speed and throughput.
Pipelining techniques have been used in synchronous circuits such as microprocessors and synchronous memories to improve data throughput. There is usually latency associated with pipelined operations. Latency refers to the number of system clock cycles it takes for the first bit of data to propagate to the output of the circuit, after which subsequent bits of data typically arrive within one clock cycle. For example, a synchronous memory circuit such as the synchronous dynamic random access memory (SDRAM), may provide for latency of one, two, three, or higher depending on the system requirements. In the context of SDRAMs, while latency is measured by the number of clock cycles, it is commonly referred to as column access strobe or CAS latency, or CL.
An improved method of pipelining is known as wave pipelining wherein data is serially pipelined to the output, stored in parallel output registers, and then clocked out serially in the sequence received. This type of wave pipelining has been employed in SDRAMs that provide for programmable latency of, e.g., 1, 2 and 3. A common implementation of a wave pipelined SDRAM with a maximum latency of N provides N output data registers (QREG) located near each output terminal (DQ). The N registers store N bits of output data before serially clocking the data out to the output terminal.
FIG. 1 illustrates another implementation of data pipelining wherein a data transmission output circuit 100 utilizes Nxe2x88x921 registers per DQ terminal rather than N registers per DQ terminal. Output circuit 100 includes Nxe2x88x921 output data registers QREG0110, QREG1111, QREG2112, QREG Nxe2x88x921 113. The input of each register is coupled to an internal data bus 120. Additionally, the output of each register is coupled to an output terminal DQ 190. Data is serially provided on the bus 120 and sequentially loaded into each of the Nxe2x88x921 registers in accordance with individual input enable signals EN_QR_IN0, EN_QR_IN1, EN_QR_IN2, and EN_QR_IN_Nxe2x88x921. Data is transmitted from each of the Nxe2x88x921 registers to the output terminal DQ in accordance with individual output enable signals EN_QR_OUT0, EN_QR_OUT 1, EN_QR_OUT 2, and EN_QR_OUT _Nxe2x88x921. Employing the technique of FIG. 1, a CAS latency value L=N may be implemented using only Nxe2x88x921 output registers.
FIG. 2 is a timing diagram illustrating the operation of a data transmission output circuit for the case of a maximum CAS latency of 3 (L=N=3) using Nxe2x88x921=2 registers per DQ terminal. Generally, it is desirable to minimize the clock period and thereby increase the frequency of the system. However, as shown in FIG. 2, the minimum clock period for the case of L=3 is constrained by at least two factors. First, the period tAA represents the time between the receipt of the read request and the time the data is available at the output of an output data register (e.g. QREG0). Second, the period tRQ represents the time between the receipt of an output enable signal (e.g. EN_QR_OUT0) and the time the data signal has propagated to the output terminal DQ and is available for reading. In other words, tAA is the address access time, and tRQ is the propagation time from QREG to output terminal DQ. Accordingly, for L=3, the sum of these two periods must be less than 3 clock cycles. However, tAA is primarily determined by the fabrication process and the inherent delays in accessing and transferring data from the memory array. Furthermore, tRQ is based on the electrical properties of the output circuit (e.g. layout and circuit architecture). Therefore, for L=3, both tAA and tRQ are effectively constant constraints. Therefore, the relation 3*tCLK greater than tAA÷tRQ must be satisfied. Alternatively, a minimum clock period is given by tCLK, min=(tAA+tRQ)/3. However, for the case of L=3, there is a two clock cycle margin. Therefore, the address access time tAA is typically not a limiting factor for a read request (i.e. two clock cycles plus the time it takes for the first output enable pulse EN_QR_OUT to be removed (tP2) is greater than tAA).
One further critical timing constraint on the circuit of FIG. 1 is that the output enable signal EN_QR_OUT must be disabled before the arrival of the next data bit from the data bus into the output register (e.g. QREG0). For example, referring to FIG. 1, EN_QR_OUT must be disabled before time marker Ti (i.e. the arrival of Q2 at QREG0). If EN_QR_OUT is not disable before TI, then the new data bit (e.g. Q2) will be passed through the output register (e.g. QREG0) to the output, and thereby lead to a possible read error. Therefore, the system timing must be constrained such that tP2, the point at which the output enable signal is disabled, is less than t1, the time between the last prior clock pulse and marker T1, the point at which the next data bit arrives from the bus into the output register. Note that t1 is the access time of data bit Q2, and therefore, t1=tAA. Accordingly, typical pipelined systems have employed pulsed output enable signals (e.g. EN_QR_OUT less than 1:0 greater than ) with timing control to serialize the output data such that proper data is transmitted to the output terminal DQ before new data is loaded into the output registers.
However, an Nxe2x88x921 output register implementation of a data transmission output circuit presents a different set of timing requirements when the SDRAM is programmed for a latency less than the maximum latency N (i.e. L less than N). Specifically, if the circuit is programmed for L=Nxe2x88x921=2, there is only one clock cycle margin provided for the QREG0 enable pulse EN_QR_OUT less than 0 greater than . FIG. 3 is a timing diagram illustrating the operation of a data transmission output circuit for the case of CAS latency of two. Similar to the case of L=N=3 above, there is a timing constraint of 2*tCLK greater than tAA+tRQ. Accordingly, the minimum clock cycle is tCLK, min=(tAA+tRQ)/2. However, for the case of L=2, the address access time tAA may become a limiting factor. Therefore, in addition to the first constraint, tAA must also not exceed one clock cycle tCLK plus tP2. In other words, the data retrieved in response to a read access must be in the output register before the output enable signal is disabled. If tAA is greater than this time period, EN_QR_OUT will be disable before the data arrives in QREG, and the data will not be passed to the output terminal DQ. Thus, in the case of L=2, there is a second limitation that tCLK,min=tAAxe2x88x92tP2. Therefore, in the case of L=2 the clock frequency of the system may need to be reduced beyond the minimum defined by tCLK,min=(tAA+tRQ)/2 to ensure that the output enable pulse (i.e. EN_QR_OUT) remains active until after valid data has arrived (i.e., after tAA).
Accordingly, operating a pipelined circuit with a latency value L that is less than the maximum latency N in the Nxe2x88x921 register implementation, results in a speed penalty. What is needed is a circuit and method for processing output data in a pipelined circuit that does not impose timing restrictions that adversely affect the speed of the system.
In accordance with one embodiment of the present invention, a memory circuit includes an output terminal, a plurality of data registers each coupled between the output terminal and a data bus, each storing successive data bits received serially from the data bus, a plurality of enable signals each coupled to a corresponding data register, wherein when one of the plurality of enable signals is active a data bit in the corresponding data register is coupled to the output terminal and when one of the plurality of enable signals is inactive a data bit in the corresponding data register is not coupled to the output terminal, and a mode select circuit to program the plurality of enable signals to operate in one of a plurality of modes corresponding to a programmable latency period, wherein in a first mode the enable signals have a first pulse width and in a second mode the enable signals have a second pulse width greater than the first pulse width.
In one embodiment, the memory circuit has a maximum programmable latency period of N and plurality of data registers and corresponding enable signals is Nxe2x88x921.
In another embodiment, when the latency period is programmed for N, the enable signals operate in a first mode, and when the latency period is programmed for less than N, the enable signals operate in the second mode.
In accordance with another embodiment of the present invention, a data transmission circuit having a maximum programmable latency of N includes an output terminal, Nxe2x88x921 output registers configured to store Nxe2x88x921 bits of data, each output register having an output coupled to the output terminal, and a parallel-to-serial converter coupled to the Nxe2x88x921 output registers and configured to serialize the Nxe2x88x921 bits of data in response to an output enable signal, wherein, when the circuit operates with a latency of N, the output enable signal has a first pulsed width, and when the circuit operates with a latency less than N, the output enable signal has a second pulse width.
In accordance with another embodiment, the present invention includes a method of transmitting data to an output terminal of a memory system comprising programming a latency period in the memory system, programming a plurality output enable signals to operate in one of a plurality of modes corresponding to the latency period, wherein in a first mode the output enable signals have a first pulse width and in a second mode the output enable signals have a second pulse width, sequentially storing output data in a plurality of output registers, wherein each output register is coupled to the output terminal, generating the plurality of output enable signals, and coupling each of the plurality of output enable signals to a corresponding one of the plurality of output registers, wherein each output enable signal selectively couples a data bit in a corresponding output register to the output terminal.
In accordance with another embodiment, the present invention includes a method of operating a pipelined circuit having a maximum latency of N, the method comprising converting data from a serial bit stream to Nxe2x88x921 parallel bits of data, steering the Nxe2x88x921 parallel bits of data into Nxe2x88x921 output registers, and converting the Nxe2x88x921 parallel bits of data into serial data, wherein when the circuit operates with a latency of N, the conversion utilizes output enable signals having a first pulse width, and when the circuit operates with a latency of less than N, the conversion utilizes output enable signals having a second pulse width.