The present invention generally relates to systems for processing data. Specifically, the invention relates to a method and apparatus for producing output data from a circuit a selected number of clocks after a read cycle is initiated in the circuit.
Semiconductor memory systems have evolved rapidly in the past years. Memory system sizes have roughly doubled every three years, as the number of bits per memory chip was increasing by a factor of four. Therefore, with each new generation of DRAMS, only half as many individual chips are needed in a memory system. With a reduction in the number of memory chips in a system, there is a reduction in the total number of data output pins. With fewer output pins, the bandwidth of the memory system decreases. However, as microprocessor and multiprocessor systems advance, the demands on memory systems continue to increase. Most critically, computer systems require greater and greater data bandwidths. That is, the systems demand that more information be presented at the output pads of the memories in any given time interval. There is therefore a need to increase the bandwidth of each memory chip. Achievement of these greater bandwidths is complicated by the need to preserve precious gains in bit density and substrate space.
This need to increase bandwidth has led to the development of new types of memory systems. One promising new memory chip architecture is the Synchronous Dynamic or Static Random Access Memory (SDRAM or SSRAM). These chips use a clock to control data flow and thereby provide significant increases in output data bandwidth over that provided by previous memory chips. In these synchronous designs, pipelining is used to increase the bandwidth of data output. In this discussion, it will be assumed that the address access time through an exemplary SDRAM (from column address to output) is 15 ns. Without pipelining, read cycles can occur every 15 ns. In a synchronous DRAM a latency (or pipeline depth) of three may be used to increase the overall data rate by a factor of three. That is, for a 15 ns address-access-time SDRAM, read requests and data outputs may be made every 5 ns. A first request may be made at time T0. The data from that request will be valid on the output of the DRAM 15 ns (3 clocks) later. A second read request is made at T+5 ns, and a third read request occurs at T+10 ns. The clock occurring at T+10 ns also commands the data resulting from the first read cycle to appear on the outputs. Coincident with the fourth read request at T+15 ns, data from the first request is available at the outputs. This data is followed by new data every 5 ns from subsequent read cycles. The result is a system having a cycle time much less than the address access time, dramatically increasing the bandwidth.
This same 15 ns address-access-time part, if operated with a 7.5 ns cycle time, can output data with only a one clock delay. That is, a first read cycle starts at T0. A second read cycle starts at T+7.5 ns, at which time the data resulting from the first read cycle is commanded to be output. A third cycle starts at T+15 ns as the first data is valid on the outputs and is read out. This operation is referred to as a latency of two. This exemplary 15 ns address-access-time part is too slow to operate correctly with a latency of two at a 5 ns cycle time. DRAMs operating at higher frequencies must use greater latencies. Thus, it is desirable to provide an ability to program the latency of a particular memory part, allowing optimized use at a number of different operating frequencies. It is through the use of pipelining, e.g., starting a second and third cycle before completing an access of the first cycle, that synchronous memories are able to provide a greatly increased bandwidth over previous memory designs.
Unfortunately, however, this increased bandwidth has not yet been achieved without some cost. Current pipeline implementations require the use of a great number of transmission gates or latches to cycle or "step" data through the pipe. A typical pipeline circuit is formed from a plurality of transmission gates made of NMOS and PMOS transistors. Data is clocked through the pipeline circuit by allowing it to sequentially proceed through stages defined by the transmission gates. Thus, for a latency of two, two sets of transmission gates may be used to step the data through the pipe. A first set of gates are enabled to advance the data received at the input buffers through the memory. A second set of gates, positioned later in the data path, are enabled to advance the data to the output as new data starts at the input. Latencies of three, similarly, require three sets of transmission gates, disposed at several locations through each parallel data path.
Although this approach achieves the general goals of pipelining output data from a synchronous circuit, it is unsatisfactory for several reasons. The large number of transmission gates required by such an approach adversely affects several important RAM design characteristics. The intermediate transmission gates in a SDRAM may, for example, be optimally placed at the output of each column decoder of the memory. These memories have a large number of column decoder outputs. Therefore, a very large number of transmission gates is required to generate, e.g., a latency of three. Each of these transmission gates consumes power, takes up valuable substrate area, and adds both resistance and capacitance in the signal path, thereby adding delay to the address access time. As a result, synchronous DRAM circuits utilizing common pipelining techniques occupy substantially greater substrate space than asynchronous circuits of similar capacity. Further, use of these common techniques creates memory devices having increased power requirements and increased address access times although they do achieve the objective of decreased cycle time.
The large number of transmission gates is increased even further when a programmable latency is used. For example, if a selectable latency of either two or three is implemented using transmission gates, the optimum placement of the gates in the overall data path is different for the different latencies. Substrate space is consumed rapidly using this approach.
Accordingly, a high speed pipelining technique is needed which reduces or eliminates the need for multiple transmission gates in the data path. It is further desirable that the technique support a programmable latency of any desired value. These needs should be satisfied without significantly compromising gains in bit density, substrate area, and power consumption.