1. Field of the Invention
This invention relates generally to digital electronic systems. More particularly, this invention relates to techniques for efficiently transferring information in digital electronic systems.
2. Description of the Related Art
In a generalized multi-device digital electronic system, there can be multiple master and slave devices which are connected by an interconnect structure, as shown in FIG. 1. Wires between the components form the interconnect. Transport of information over the interconnect occurs from transmitter to receiver, where the master or the slave components can act as either transmitter or receiver.
One particularly interesting case is when the slave is a memory device and there is a single master, as shown in FIG. 2. Because of the high occurrence of read operations in typical memory reference traffic, an important case is the transmission of control information from master to slave and the return transmission of read data from slave to master. The round trip delay forms the read latency.
In a pipelined system, total delay to perform an operation is divided into clock cycles by dividing the entire datapath into separate pipe stages. In a pipelined memory system, total read latency is also divided into clock cycles. As operating frequency increases, delay variations from both the interconnect and components are exposed. These delay variations can cause logical device-to-device conflicts which make the operation pipeline less efficient. It is thus desirable to compensate for these timing variations, which can occur depending on the position of the memory parts on the channel and internal delays in the memory devices.
Before discussing the sources of timing variation in a memory system, some background information on the structure and operation of memory cores is provided.
Memory Structure and Operation
In this section memory operations are defined. FIG. 3 illustrates a memory with a memory core and a memory interface. The memory interface interacts with an interconnect structure. The following discussion expands upon the generic memory elements of FIG. 3 to identify separate structural elements and to discuss the memory operations and memory interactions with the interconnect.
General Memory Core
In this subsection the structure of memory cores into rows and columns is illustrated and the primitive operations of sense, precharge, read, and write are introduced.
A simple memory core typically consists of a storage array, column decoder, row decoder, and sense amplifiers, as shown in FIG. 4. The interface 100 to a memory core generally consists of a row address 101, column address 103, and data path 102. The storage array, shown in FIG. 6, is organized into rows and columns of storage cells, each of which stores one bit of information. Accessing the information in the storage array is a two step process. First, the information is transferred between the storage array and the sense amplifiers. Second, the information is transferred between the sense amplifiers and the interface via connection 100.
The first major step, transferring information between the storage array and the sense amplifiers, is called a "row access" and is broken down into the minor steps of precharge and sense. The precharge step prepares the sense amplifiers and bit lines for sensing, typically by equilibrating them to a midpoint reference voltage. During the sense operation, the row address is decoded, a single word line is asserted, the contents of the storage cell is placed on the bit lines, and the sense amplifier amplifies the value to a full rail state, completing the movement of the information from the storage array to the sense amplifiers. An important observation is that the sense amps can also serve as a local cache which stores a "page" of data which can be more quickly accessed with column read or write accesses.
The second major step, transferring information between the sense amplifiers and the interface, is called a "column access" and is typically performed in one step. However, variations are possible in which this major step is broken up into two minor steps, e.g. putting a pipeline stage at the output of the column decoder. In this case the pipeline timing has to be adjusted.
From these two major steps, four primary memory operations result: precharge, sense, read, and write. (Read and write are column access operations.) All memory cores support these four primary operations or some subset of these operations. As later sections describe, some memory types may require additional operations that are required to support a specific memory core type.
As shown in FIG. 5, memory cores can also have multiple banks, which allow simultaneous row operations within a given core. Multiple banks improve memory performance through increased bank concurrency and reduced bank conflicts. FIG. 5 shows a typical core structure with multiple banks. Each bank has its own storage array and can have its own set of sense amplifiers to allow for independent row operations. The column decoder and datapath are typically shared between banks.
FIG. 6 shows the generic storage array structure. As shown, the word line (106) accesses a row of storage cells, which in turn transfers the stored data on to the bit lines (107). While the figure shows a pair of bit lines connected to each storage cell, some core organizations may require only one bit line per cell, depending on the memory cell type and sensing circuits.
The general memory core just described provides the basic framework for memory core structure and operations. However, there are a variety of core types, each with slight differences in structure and function. The following three sub-sections describe these differences for each major memory type.
Dynamic RAM (DRAM)
This section describes the structure and primitive operations for the conventional DRAM core. The structure of a conventional DRAM core is shown in FIG. 7. Like the generic memory core in FIG. 4, the conventional DRAM structure has a row and column storage array organization and uses sense amplifiers to perform row access. As a result, the four primary memory operations, sense, recharge, read and write, are supported. The figure shows an additional "column amplifier" block, which is commonly used to speed column access.
The core interface 100 consists of the following signals: row address 101, column address 103, data I/O bus 106, row control signals 107 (these signals are defined in detail further in this section), and column control signals 108 (these signals are defined in detail further in this section).
FIG. 8 shows a conventional DRAM core with multiple banks. In this figure, the row decoder, column decoder, and column amplifiers are shared among the banks. Alternative organizations can allow for these elements to be replicated for each bank, but replication typically requires larger die area and thus greater cost. Cheap core designs with multiple banks typically share row decoders, column decoders and column datapaths between banks to minimize die area.
Conventional DRAM cores use a single transistor (1T) cell. The single transistor accesses a data value stored on a capacitor, as shown in FIG. 9. This simple storage cell achieves high storage density, and hence a low cost per bit, but has two detrimental side effects. First, it has relatively slow access time. The relatively slow access time arises because the passive storage capacitor can only store a limited amount of charge. Row sensing for conventional DRAM takes longer than for other memory types with actively-driven cells, such as SRAM. Hence, cheap DRAM cores generally result in slow row access and cycle times. Another problem is that cell refresh is required. Since the bit value is stored on a passive capacitor, the leakage current in the capacitor and access transistor result in degradation of the stored value. As a result, the cell value must be "refreshed" periodically. The refresh operation consists of reading the cell value and rewriting the value back to the cell. These two additional memory operations are named refresh sense and refresh precharge, respectively. In traditional cores, refresh sense and refresh precharge were the same as regular sense and precharge operations. However, with multiple bank cores, special refresh operations are advantageous to enable dedicated refresh circuits and logic to support multibank refresh.
FIG. 10 shows details of a bit slice of a typical row datapath, and FIG. 11 shows the timing diagram of a precharge and sense operation. To perform a row access, the bit lines and sense amplifiers must first be precharged, typically to the Vdd/2 midpoint. The row precharge time, tRP, is shown in FIG. 11.
To perform a sense operation, the row decoder drives a single word line to turn on access transistors to a row of memory cells. The charge on the storage capacitor transfers to the bit line, slightly changing its voltage. The sense amplifier detects this small voltage change and drives the bit lines to full rail (Vdd and Gnd). The wordline must be held high a significant portion of the time period of tRAS,min to complete the sensing operation. At some time before the bit lines reach full rail, a column read or write access can begin. The time between the start of the sense operation and the earliest allowable column access is tRCD, row to column access delay.
The total time to perform both precharge and sense is tRC, the row cycle time, and is a primary metric for core performance. Table 1 shows typical DRAM row timing values.
TABLE 1 Typical DRAM Row Timing Parameters Symbol Description Value Units tRP Row precharge time 20 ns tRCD Row to column delay 26 ns tRC Row cycle time 80 ns tRAS, min Minimum row active time 60 ns
It is important to note that memory device timing parameters can vary widely across various device designs, manufacturing processes, supply voltage, operating temperature, and process generations. In order for the memory architecture to be widely usable, it is very important for the protocol to be able to support these variable row and column timings.
FIG. 10 shows a common cell organization which alternates cell connections between wordlines. This leads to a dense packing of cells and also allows the sense amplifier to use the voltage on the unused bitline as a reference for differential bit line sensing.
Separate PRECH and SENSE control can be used at the core interface. Traditional cores use a single control signal, commonly called RAS, and use the rising and falling edges to distinguish between sense and precharge. Separated PRECH and SENSE signals, together with a separate bank address for sense and precharge, support cores with pipelined precharge and sense operations occurring in multiple banks.
The row sensing power includes the power to decode the row address, drive the wordline high, and turn on the sense amplifiers, which must drive the bit lines from Vdd/2 to Vdd and Gnd. Thus, a significant portion of row sense power is proportional to the number of sense amplifiers that are turned on (i.e., the page size).
FIG. 12 shows an example of row access timing diagram for DRAMs with multiple banks. The period t.sub.SS specifies the minimum delay between sense operations to different banks. Similarly, the period t.sub.PP specifies the minimum delay between precharge operations to different banks.
FIG. 13 is a more detailed diagram of a typical DRAM column datapath. The output of the column decoder, which may be placed in a register for pipelined designs, drives a single column select line, which selects some fraction of outputs from the sense amplifiers. The selected sense amplifiers then drive the data on to the column I/O wires. To speed column access time, the column I/O lines are typically differential and sensed using differential column amplifiers, which amplify small voltage differences on the column I/O wires and drive the data I/O bus to the interface. The width of the column I/O bus sets the data granularity of each column access, also known as CAS block granularity.
The data I/O can either be bidirectional, in which write and read data are multiplexed on the same bus, or unidirectional, in which write and read data have separate buses. FIG. 13 shows unidirectional data I/O.
Column access power consists of the power to decode the column address, drive the column select line, turn on the column amplifiers, and drive the column I/O wires. Column power is roughly proportional to the column cycle frequency and the width of the column I/O datapath.
Some DRAM cores also include the ability to mask write data, so that some bits or bytes of the datapath are not written depending on the mask pattern. Typically, the mask pattern is delivered to the column amplifier write circuit, which inhibits the write data appropriately.
A timing diagram for a column read operation is shown in FIG. 14. The key timing parameters of the column read access are:
tPC, column cycle time: the minimum cycle time of a column access. This parameter determines how fast data can be cycled to and from the memory core. The CAS block granularity divided by tPC equals the core data bandwidth. PA1 tCLS, COLLAT setup to COLCYC: the minimum set-up time of latching the column address to the rising edge of COLCYC, when data access from the sense amplifiers starts. PA1 tDAC, column read access delay: the delay from the rising edge of COLCYC to when READDATA is valid at the interface. PA1 tCAS: the minimum time that COLCYC stays high. This parameter sets the maximum time it takes to transfer data from the sense amplifiers to the column amplifiers and determines when column precharge can start. PA1 tCP, column precharge: the minimum time that COLCYC stays low. This parameter sets the maximum time it takes to precharge the column I/O wires. PA1 tCPS, COLCYC low setup to row precharge: the minimum set up time that COLCYC stays low before row precharge begins. This parameter is important since tCAS+tCPS determines when a row precharge operation can begin relative to the start of a column operation. PA1 tDOH, data output hold time: tDOH is the minimum hold time of READDATA after the next COLCYC rising edge. Note: tPC-tDAC+tDOH determines the READDATA minimum valid window at the core interface. PA1 tASC, column address setup: the minimum column address set up time before COLLAT rising edge. PA1 tCAH, column address hold: the minimum column address hold time after COLLAT rising edge. Note: tASC+tCAH determine the minimum column address valid window that must be observed to perform a column operation to the core. PA1 tCLL, COLLAT low: the minimum time that COLLAT stays low. PA1 tCLH, COLLAT high: the minimum time that COLLAT stays high. PA1 tDS, WRITEDATA setup: the minimum WRITEDATA setup time before the rising edge of COLCYC. PA1 tDH, WRITEDATA hold: the minimum WRITEDATA hold time after the falling edge of COLCYC. Note: tDS+tCAS+tDH determines the minimum WRITEDATA valid window that must be observed to perform a write operation to the core. PA1 tWES, WMASK setup: the minimum set up time for a write mask before the rising edge of COLCYC. PA1 tWEH, WMASK hold: the minimum hold time for a write mask after the falling edge of COLCYC. Note: tWES+tCAS+tWEH determines the minimum WMASK valid window that must be observed to perform a write mask operation to the core.
A timing diagram for column write operation is shown in FIG. 15. Many timing parameters, which include tPC, tCAS, tCP, tCLS, tCPS, tCLL, tCLH, tASC and tCAH, are the same as those for column read. Additional key timing parameters of the column write access are
Table 2 shows typical DRAM column timing values.
TABLE 2 Typical DRAM Column Timing Values Symbol Description Value Units tPC Column cycle time 10 ns tCAS COLCYC high 4 ns tCP COLCYC low 4 ns tCLS COLLAT to COLCYC setup 2 ns tDAC READDATA valid from COLCYC 7 ns rising tCPS COLCYC low setup time to row 1 ns precharge tASC COLADDR setup to COLLAT rising 0 ns tCAH COLADDR hold from COLLAT rising 5 ns tDOH READDATA hold from next 3 ns COLCYC rising tDS WRITEDATA hold from COLCYC 1 ns falling tDH WRITEDATA hold from COLCYC 1 ns falling tWES WMASK setup to COLCYC rising 2 ns tWEH WMAST hold from COLCYC falling 0 ns
It is important to note that DRAM timing parameters can vary widely across various manufacturing processes, supply voltage, operating temperature, and process generations. In order for the memory architecture to be widely usable, it is very important for the DRAM protocol to be able to support these variable row and column timings.
Typical column cycle times and access times greatly depend on the type of sense amplifier circuit, since the sense amplifier actually drives the data on to the column I/O wires. Increased speeds can be achieved by using more transistors in the sense amplifier circuit to improve drive capability, but this greatly increases the die area and cost since the sense amplifier circuit is heavily replicated. Thus, the desire to minimize die area for commodity DRAMs inhibits the further reduction of column access speeds.
Static RAM (SRAM)
SRAM shares a similar core structure and functional blocks as DRAM. Like RAM, access is performed in a similar two step process. First, in the sense peration, the information is transferred between the storage array and the sense amplifiers. Second, in the column access operation, the information is transferred between the sense amplifiers and the interface. Also, similar to DRAM, the bitlines must be precharged before sensing occurs, although a typical precharge value is Vdd, not Vdd/2.
The key difference lies in the storage cell. In an SRAM, data is stored statically, typically using a circuit of several transistors. A typical SRAM cell is shown in FIG. 16. The SRAM of FIG. 16 uses cross-coupled CMOS inverters to store a single data bit. A word line turns on access transistors, which connect the cell circuit to differential bit lines. Unlike a DRAM cell, the SRAM cell circuit actively drives the stored value on to the bit lines, thus resulting in faster access time. The static nature of the SRAM cell eliminates the need for cell refresh. However, the static cell also uses more transistors and takes up much more area than a DRAM cell. The four primitive operations of an SRAM are sense, precharge, read and write.
Read-Only Memory
Read-only memory cores store information according to an electrical connection at each cell site which joins rows to columns. Typically, a single transistor forms the electrical connection at each cell site. A simple ROM array is shown in FIG. 17.
There are a variety of ROM cell types, including erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash ROM, and mask programmable ROM. Their differences lie in the type of transistor used at the cell site. However, all ROM types share the common 2-D storage array organization, which requires a row and column decode of the address for each data access.
Unlike SRAMs or DRAMs, not all ROMs have sense amplifier circuits. Sense amplifiers are only used in some ROMs which require fast access times. For these ROMs, the primitive operations are sense, precharge and read.
For slower ROMs that do not use sense amplifiers, the data values are directly driven from the cell to output amps which drive the interface. For these ROMs, the ingle primitive operation is read.