Computer system performance depends upon processor performance and memory performance. Various ways (e.g., pipelining) are known to improve processor performance. Usually, the processor runs faster than the system's Dynamic Random Access Memory (DRAM). So, the DRAM is the normal gate to system performance. Increase DRAM performance and system performance improves. Therefore, high performance DRAM is always at a premium. However, DRAM performance still does not match processor performance. For this reason, a primary concern of memory chip designers is performance. Since DRAM performance does not match processor performance, high performance system designers have resorted to other approaches to reduce memory access time.
One such approach, known as caching, is to place fast Static Random Access Memory (SRAM) between the processor and the system memory, DRAM. Large blocks of data, typically 64 k-512 k bytes, are transferred from the DRAM to the faster SRAM cache. Provided that processor operation can be confined to data and instructions in the cache, the SRAM cache can match or nearly match processor speed. However, adding the cache complicates the system and, increases system cost.
In an attempt to eliminate the SRAM cache and its associated system complication and increased cost, several prior art approaches have been used to improve the DRAM's performance to allow coupling the system memory directly to the processor. One such prior art approach is known as Page Mode operation. In Page mode, column addresses are toggled and data from a single row is transferred into or out of the DRAM with the Column Address Strobe (CAS) clock signal. In Page Mode, one data bit for each active data out is transferred during each CAS cycle.
A variation on Page Mode is Nibble Mode. In Nibble Mode, addresses are not changed in every CAS signal, but one data bit is still provided for every CAS cycle. The Nibble Mode data rate is doubled over Page Mode by providing the RAM with a special input clock signal, Toggle or TOG. Data is transferred on every rising as well as every falling transition of TOG, approximately doubling the data rate over Page Mode. Addresses change only for every other bit of data.
A Synchronous DRAM (SDRAM) is yet another approach to improving DRAM data transfer rate. In an SDRAM, a block of data is serially transferred at the rising edge of a continuously running clock signal. A starting column address is passed to the SDRAM and then, one or more clock periods later, a string of 4, 8 or more data bits are transferred starting from that particular addressed column. The data transfer length is called the Burst Length. An SDRAM must accept an address every other clock cycle. This requirement is known as the 2N rule. Typical SDRAM's provide bursts of data in 4, 8 or full page.
Typical prior art SDRAM architectures are either a pre-fetch architecture or a pipeline architecture. Data blocks are provided in bursts that may be ordered in what is termed an "interleaved" scramble or in a "sequential" scramble. Table 1 represents sequential scramble for a 4-bit block with each bit indicated by a numeral, 1-4, indicating its position relative to the block starting address. There are four possible combinations (cases) of bit order for 4 bits of sequentially scrambled data. As the name implies, sequentially scrambled data is scrambled only in that the starting bit for the burst may be any bit in the block and is not restricted to one bit in particular.
TABLE 1 ______________________________________ CASE # ______________________________________ 1) 1 2 3 4 2) 2 1 4 3 3) 3 4 1 2 4) 4 3 2 1 ______________________________________
TABLE 2 ______________________________________ CASE # ______________________________________ 1) 1 2 3 4 2) 2 3 4 1 3) 3 4 1 2 4) 4 1 2 3 ______________________________________
Table 2 represents four possible interleave scramble bit combinations. An interleaved scramble may also start the burst on any bit in the block. However, unlike the sequential scramble, the bit order for the burst is not necessarily sequential. For either scramble, the data must be scrambled (re-ordered) on the SDRAM before it is passed off-chip.
For a pipelined SDRAM, in order to provide the correct internal selection, its internal address bus must switch on each clock cycle. So, the scramble decode logic must operate at clock speed. Since this decode logic switches so frequently, it accounts for an appreciable percentage of chip transient power. Further, the clock must be routed to pipeline registers throughout the SDRAM that synchronize on-chip circuit operations. There are variations in clock driver loads and in wiring paths that introduce some on-chip clock skew. This clock skew reduces the time available for logic operations between pipeline registers. Therefore, it is difficult to scramble the burst data on chip at clock speed.
By contrast, on a pre-fetch architecture SDRAM, addresses are decoded every other clock cycle. The decode may be done in two clock cycles instead of one. So, a longer period of time (2 clock cycles) is available for address decoding. As a result, switching power is lower because addresses switch every other cycle. Thus, a pre-fetch SDRAM has a less stringent timing requirement by about a factor of two for data scrambling logic as compared to a pipelined SDRAM. Relaxation of this speed requirement is a derivation of the 2N rule. Further, the clock is required at fewer points on a pre-fetch chip than on a pipelined chip because there are fewer pipeline stages, if any, on a pre-fetch chip. Thus, a pre-fetch SDRAM uses less transient power than a pipelined SDRAM.
However, pipelined SDRAMs have an advantage over Pre-Fetch SDRAMs for a full page-length burst block. A full page burst requires that a data transfer start at any address and continue until an entire page of data (usually about 1024 bits) is transferred. Pipelined SDRAMs queue data into the pipeline, and so, simply start the data transfer from the queue at the starting address.
By contrast, pre-fetch boundaries complicate the pre-fetching during full page bursts. This complication arises because the burst block start location, in all likelihood, will not align with a pre-fetch page boundary. Thus, if a user requests a page of sequential bits that are not aligned to a page boundary, multiple accesses are required. These multiple accesses cause delays, or gaps, between accesses, i.e., during the block transfer.
While this complication is most pronounced for full page burst operation of a pre-fetch SDRAM, it may be equally troublesome for smaller bust lengths. For example, in a chip organized for an eight bit burst block length, to transfer 8 bits, starting at bit 6 of one block, first requires accessing a first block, bits 0-7, to transfer bits 6 and 7 and, then, accessing a second block to transfer bits 0-5 of that second block. There is a delay, or gap, in between blocks because of the second block access.