High-speed memories, especially high-speed static random access memories (SRAMs), are important in desktop computing and communications applications. A typical use for such memories is for a cache for a data processor. A cache is a relatively high-speed memory which contains a local copy of data located in a larger but slower main memory. The cache improves system performance because once the data processor accesses a data element at a particular address, there is a high probability it will access data elements at adjacent addresses. Making the cache memory as fast as possible improves system performance because data processors are also capable of operating at very high speed.
One technique to speed up cache accesses is the use of burst cycles. During a burst cycle, the data processor fetches data from a series of memory locations which are either consecutive or are clustered about the access address in modulo fashion. During the initial access of the burst, the data processor presents the burst address to the memory. The memory activates a word line selected by the burst address and keeps the word line active throughout the burst. All memory cells located along the activated word line provide differential voltages to corresponding bit line pairs. A column decoder selects a subset of the bit line pairs corresponding to the data element selected in that portion of the burst. Differential voltages developed between the selected bit line pairs are then sensed and amplified before final output. In subsequent cycles, other subsets of the bit line pairs corresponding to other data elements in the burst are selected. Since the address decoding, word line selection and driving, and bit line differentiation have already taken place, the subsequent cycles of the burst are faster.
A second technique that has become popular is to make these memories synchronous with the data processor's clock signal. Since the data processor accesses data from the bus synchronously, the memory can take advantage of the available clock signals to control its internal operation.
A third technique, which is applicable to synchronous memories, is pipelining. Pipelining breaks down a complex task into a series of smaller sub-tasks. Each sub-task is performed by an asynchronous circuit. Between each asynchronous circuit is a pipeline register which captures the output of the previous pipeline stage for presentation to the next stage in synchronism with a clock periodic signal. Pipelining allows different sub-tasks of several operations to be performed in parallel, increasing performance.
For example in the data processor field, which uses pipelining extensively, the execution of a program instruction can be implemented in a five-stage pipeline which includes instruction fetch, instruction decode, operand fetch, execution, and writeback stages. Performance is increased in this five-stage pipeline example because while one instruction is being written back, a second instruction can be executed, a third instruction can perform operand fetch, and so on.
Pipelining has also been applied to synchronous burst memory devices because a burst access can be conveniently broken down into overlapping sub-tasks. For example, a known synchronous memory pipeline includes an address input stage, an address predecoding stage, an array access stage, and a data output stage. In conformity with pipelining rules, this memory includes a register between each stage for a total of three registers. When such a memory receives a burst access which requests four data elements, the first access takes four cycles between address input and data output, but due to the pipelining feature subsequent accesses take one cycle each. Hence this memory is designated a "4-1-1-1" memory.
As time goes on, however, data processors are being clocked by faster and faster clocks, making it more difficult for conventional pipelined memories to propagate all signals through each stage of the pipeline without breaking up the circuitry further and adding more pipeline stages. What is needed, then, is a synchronous pipelined burst memory which is able to operate with faster clock speeds without adding extra depth to the pipeline. Such a memory is provided by the present invention, whose features and advantages will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings.