In recent years, advances in design approaches and manufacturing processes has led to central processing units (CPUs) of increased speeds. At the same time, the speeds of the most common form of main system storage, dynamic random access memory (DRAM), have increased at a much slower rate. As a result, DRAMs may not always be capable of providing read data to, or receiving write data from, a CPU at a fast enough rate. One way to address the speed differences in CPUs and slower memory devices (such as DRAMs), is to utilize a cache memory.
A cache memory is typically considerably smaller than main memory, but can operate at a much faster speed. Cache memories can be a portion of a single CPU integrated circuit (chip), or be a separate device having an external connection to a CPU. A typical cache memory will hold a copy of a portion of the data stored within the main memory, to allow rapid access thereto. The copy is usually arranged into pages, each of which occupy a contiguous range of addresses.
In the operation of a typical system that employs a cache memory, the CPU will routinely make read accesses to the cache memory. If the desired data is present within the cache memory, the data are used by the CPU. If the desired data are not present within the CPU, the desired data are copied anew (in page form) from the main memory into the cache memory. Furthermore, in the event a page of data is altered within a cache memory, the page may have to written back into the main memory. Thus, the overall system speed can depend upon the rate at which data can be written from the main memory into the cache memory. Accordingly, it is desirable for the devices that form a main memory to be capable of high-speed transfers of data strings, consisting of continuous address values, to and from a cache memory.
A preferred method of passing data between a main memory and a cache memory is that of "burst transfer." In a burst transfer, an initial (base) address within the main memory is specified, and then the data string is output (or input in the case of a write operation) in synchronism with a reference clock. The length of the data string is referred to as a "burst length." One example of a type of memory that is capable of providing burst transfers is a synchronous DRAM (SDRAM). SDRAMs operate in synchronism with a reference clock, latching addresses and providing data accesses in synchronism with the reference clock.
Burst transfers are preferred in SDRAMs because such transfers can provide faster overall data throughput than other DRAM approaches. For example, one type of general use (i.e., non-synchronous) DRAM is the "fast page mode" DRAM. Fast page mode DRAMs receive an address, and in response thereto, provide output data. The time between the application of the address and the presence of data at the output of the fast page mode DRAM is often referred to as the "address access time." High-speed fast page mode DRAMs can have an address access time of 20 nanoseconds (ns) (an operating speed of 50 MHz).
In the case of burst SDRAMs, the access of an initial set of data is accomplished in the same general fashion as a general use DRAM. As a result, when accessing a single set of data (the data resulting from one address), SDRAMs provide no significant speed advantages over general use DRAMs. However, once a SDRAM has accessed an initial set of data, each subsequent data set in a particular address order can be accessed at a faster speed than a general use DRAM. This is accomplished by simultaneously processing multiple sets of data within the burst SDRAM, so that consecutive sets of data can be input or output at a higher sustained rate. As a result, the frequency at which data can be input to or output from a burst SDRAM can be 100 MHz or higher. This sustained rate is referred to as the "burst transfer frequency." Maximum access speeds are achieved by running the reference clock of a SDRAM at the maximum burst transfer frequency.
While burst accesses can provide faster speeds for a sequential group of data sets, as noted above, accesses to an initial data set (i.e., the first data set in a burst sequence) provide no significant speed advantages over general use DRAM accesses. As a result, when a command input is applied to a burst SDRAM on one reference clock period, a number of clock periods will go by before the data set will be available at the output of the SDRAM. The number of clock periods between the application of a command input and the presence of output data is often referred to as "CAS latency." The term CAS latency is used, as it is usually a column address strobe (CAS) signal that is used to initiate a data access operation.
SDRAMs are typically capable of providing a programmable CAS latency. That is, while an SDRAM may have a minimum CAS latency, the CAS latency can be increased or decreased by one or more reference clock periods, if desired. CAS latency values, as well as various other operating parameters, are conventionally set by applying one or more predetermined commands to a "mode register" within the SDRAM.
One reason programmable CAS latencies exist, is to accommodate a range of operating speeds. Within a SDRAM, the time required to generate an internal address following the activation of a CAS signal can be considered an address access time. The address access time represents the speed at which decoder and related circuits within the SDRAM operate, and can be independent of the reference clock signal. As a result, variations in the reference clock frequency may require changes in the CAS latency in order to ensure the most efficient operation of the SDRAM. For example, some buses may not be capable of operating at the maximum burst transfer frequency. In such a case, the reference clock for the SDRAM will be relatively slow. Minimum address access times may fall within one reference clock period. However, in the event the reference clock is running at the maximum burst transfer frequency, the minimum address access time may be greater than two or more clock periods. Consequently, the SDRAM CAS latency may have to be increased.
A number of approaches have been developed to allow SDRAMs to be capable of operating in a burst mode. Two common approaches are "pipeline" systems and "prefetch" systems. Pipeline systems typically include a series of circuit stages, each of which functions in synchronism with the reference clock. In this manner, address information and data are shifted along the various circuit stages, eventually resulting in data being output on consecutive reference clock cycles from an output stage. Prefetch systems initially "prefetch" multiple data sets in an essentially parallel fashion. The multiple data sets are then subsequently output in a serial fashion on consecutive reference clock cycles.
Referring now to FIG. 13, a timing diagram is set forth illustrating read operations in a conventional pipeline system. The timing diagram sets forth two consecutive burst read operations, each having a burst length and CAS latency of four. The pipeline system includes four stages; a first stage that receives an applied address (ADD), and generates a sequence of internal addresses representing consecutive addresses in a burst (YADD); a second stage that predecodes the internal addresses to generate predecoded addresses (PYADD); a third stage, that applies the predecoded addresses to an array within the SDRAM, resulting in input/output signals (IOBUS) being driven on an input/output (IO) bus internal to the SDRAM; and a fourth stage, that drives data signals (DQ) on a system data bus external to the SDRAM.
The first read access begins with the application of a first base address (AaO) in synchronism with an active read command (Read) at time T1. In the particular example of FIG. 13, the read command is generated by a combination of four signals, including a row address strobe signal (/RAS), a column address strobe signal (/CAS), a write enable signal (/WE), and a chip select signal (/CS).
Between times T1 and T2, the applied address is processed by the first stage resulting in the generation of an internal base address (YADD=Aa0).
Between times T2 and T3, the predecoder produces a predecoded base address (PYADD=Aa0). At about the same time, the first stage generates a second burst address (YADD=Aa1) of the four address burst sequence.
Between times T3 and T4, the third stage results in the data set corresponding to the base address Aa0 being output on internal IO lines (IOBUS=Da0). At the same general time, the second stage predecoder produces a second predecoded burst address (PYADD=Aa1), and the first stage generates an internal third burst address (YADD=Aa2).
Between times T4 and T5, the fourth stage results in a base output data set being driven on the data pins of the SDRAM (DQ=Aa0). The last internal burst address is generated by the first stage (YADD=Aa3), a third predecoded address is generated by the second stage (PYADD=Aa2), and the third stage results in a second data set being placed on the IO lines (IOBUS=Da1).
In this fashion, applied address information is processed in a pipelined fashion, so that multiple addresses or data sets are passed through the device, but never present at the same stage. By controlling each stage in synchronism with the reference clock, address/data collisions are avoided, and data sets are output in synchronism with the reference clock.
Referring now to FIG. 14, a timing diagram is set forth illustrating read operations in a conventional prefetch system. The timing diagram sets forth two consecutive prefetch read operations, each having a burst length of four, a CAS latency of three, and a prefetch number of two. The prefetch system receives an applied address (ADD), and in response thereto, generates internal address pairs (YADD), equal to the prefetch number (two, in this particular case). The internal addresses are then essentially processed in parallel. In response to the internal addresses, two predecoded addresses are generated (PYADD(E)) and (PYADD(O)) in parallel. The parallel predecoded addresses result in corresponding output data sets being placed on parallel IO Buses (IOBUS(E) and IOBUS(O)). Thus, a prefetch circuit includes parallel address and/or data processing circuits allowing multiple access operations to occur in parallel. Finally, having accessed multiple data sets in parallel, the parallel data sets are then output sequentially at SDRAM output pins (DQ).
The first read access of FIG. 14 begins with the application of a first base address (Aa0) with an active read command (Read) at time T1. In the particular example of FIG. 14, read commands are generated in the same fashion as described in conjunction with FIG. 13.
Between times T1 and T2, the applied first base address (Aa0) is processed to generate internal address pair Aa0/Aa1. This address pair (Aa0/Aa1) is then processed in parallel to generate parallel predecoded addresses at about time T2 (PYADD(E)=Aa0 and PYADD(O)=Aa1).
Between times T2 and T3, the parallel predecoded addresses result in output data sets on parallel IO buses (IOBUS(E)=Da0and IOBUS(O)=Da1). The parallel data sets are then output in an essentially serial fashion, with data set Da0being available at time T4 and data set Da1 being available at time T5.
In this fashion, in response to applied address information, a prefetch system will TV access data sets in parallel. The parallel data sets will then be output in a serial fashion in synchronism with the reference clock.
While the pipeline and prefetch architectures described above can provide memory devices with increased burst transfer frequencies, it is still desirable to achieve even faster burst transfer frequencies.
In a pipeline system, burst transfer frequencies can be maximized by increasing the number of stages within the device. At the same time, the amount of processing done by each stage should also be reduced, as the slowest stage will determine the maximum speed of the pipeline system. Unfortunately, it can be difficult to reduce the processing done by stages any further. In addition, an increase in the number of stages can result in undesirable increases in the size of circuits used to connect the various stages. Consequently, the number of stages can have a practical limit of three to four.
In a prefetch system, burst transfer frequencies can be increased by increasing the number of address/data sets that are processed in parallel. Such an approach results in an increase in the number of parallel stages. This can increase the peripheral area of the device, which is undesirable, as it is a common design goal to manufacture devices with as small a chip size as possible. A further drawback to processing larger numbers of address/data sets is that accesses to a smaller number of data sets are not possible. Thus, as the size of parallel accesses increases, the degree of freedom with which a system accesses the memory is reduced. This can adversely impact system performance. For these reasons, the degree of parallel processing is generally limited to two or three.
In light of the increasing speeds of CPUs and other system devices, it would be desirable arrive at some way of overcoming the system limitations described above, and thereby provide faster burst transfer frequencies in a storage device.