The present invention relates generally to semiconductor memory devices, and more particularly to circuits for reading data from, and writing data into, the memory cells of a memory device.
Computing systems typically include a computing device (a microprocessor for example) for manipulating data, and a storage device for storing data for use by the computing device. A common type of storage device is a semiconductor random access memory (RAM). In order to provide the best system performance in a computing device, it is desirable to allow the computing device to operate as fast as possible, and never be forced into an idle state while waiting to receive or store data. To achieve this result, it is important to provide a data storage device that will read and write data as quickly as possible. This gives rise to an important aspect of semiconductor memory device performance: the rate at which data can be read from, or written into the device (often referred as xe2x80x9cbandwidthxe2x80x9d).
A typical RAM includes one or more arrays having memory cells arranged in rows and columns. The memory cells are accessed in read and write operations by way of a data bus. While large data buses can increase the bandwidth of a RAM, such an approach incurs the penalty of increasing the physical size of the RAM. For this reason, in RAMs which include multiple arrays, a data bus is typically a xe2x80x9cglobalxe2x80x9d bus. That is, the data bus is commonly connected to a number of arrays. Further, to reduce the area of a RAM, the data bus is often xe2x80x9cshared.xe2x80x9d That is, the same set of lines within the data bus that are used to write data to the array, are also used to read data from the arrays. Thus, if a write operation is sending data into a memory array by way of the data bus, the write operation must be completed before a subsequent read operation can retrieve data from a memory array. Otherwise, the input and output data would both be on the data bus simultaneously, resulting in erroneous operation of the RAM. Any delay incurred between write and read operations is undesirable, because the computing device of the system may have to wait during such a delay in order to complete its computing function. The time period in which the computing system must wait for a data access operation of a storage device is often referred to as a wait state. Wait states are to be avoided, if possible, because they reduce the efficiency of system data bus timing, and hence reduce bandwidth.
To more clearly illustrate the occurrence of wait states that occur in a RAM operation, a block schematic diagram of a RAM is set forth in FIG. 1. The RAM is designated by the general reference character 100, and is shown to include a number of memory banks, beginning with a first memory bank 102a, a second memory bank 102b, and terminating in a last memory bank 102n. Each memory bank (102a-102n) can include more than one memory cell array. The storage locations within each memory bank (102a-102n) are accessed by corresponding row decoders (104a-104n) and column decoders (106a-106n). The row decoders (104a-104n) are each coupled to a row address buffer 108 by a row address bus 110. In a similar fashion, the column decoders (106a-106n) are each coupled to a column address buffer 112 by a column address bus 114. The RAM 100 further includes an address latch 116 for receiving and latching address information from a xe2x80x9cmultiplexedxe2x80x9d address bus 118. The multiplexed address bus 118 is xe2x80x9cmultiplexedxe2x80x9d in the sense that it receives either row address or column address information. The column address buffer 112 receives column address information from both address latch 116 and the multiplexed address bus 118.
The various functions of the RAM 100 are initiated by a command decoder 120. In response to information provided on a command bus 122 and/or the multiplexed address bus 118, the command decoder 120 activates a collection of control signals. Five control signals are illustrated in FIG. 1, a STORE signal, a READ signal, a WRITE signal, a COLINIT signal, and an ICLK signal. The STORE signal results in a column address being latched in the address latch 116. The READ signal initiates the internal read operation. The WRITE signal indicates an internal write function. It is noted that for the purposes of this discussion the distinction between a write operation and an internal write function should be kept in mind. The internal write function is the final step in a write operation, and includes the act of physically writing data into the memory cells of the array.
Referring once again to the control signals provided by the command decoder 120, it is noted that the COLINIT signal pulses high at the start of a column access. The ICLK signal pulses high for each bit in pre-fetch operation. Pre-fetch operations will be described below. The particular RAM 100 disclosed is a synchronous RAM, and so the RAM 100 operations are synchronous with an externally applied clock, shown as CLK.
Referring once again to FIG. 1, it is shown that the column decoders (106a-106n) are coupled to a write circuit 124 and a read circuit 126 by a shared data bus 128. The data bus is xe2x80x9csharedxe2x80x9d in that it is used for both read and write operations. The operation of the write and read circuits (124 and 126) is controlled by a shift clock circuit 130 that generates a SHFTCLK signal. In response to the SHFTCLK signal, the write circuit 124 couples data from an I/O bus 132 to the shared data bus 128, or the read circuit 126 couples data from the shared data bus 128 to the I/O bus 132. Data is placed on the I/O bus 132 at a number of data I/Os 134.
The architecture of the RAM 100 in FIG. 1 is referred to as a xe2x80x9cpre-fetchxe2x80x9d architecture. A pre-fetch architecture is one in which multiple data bit sets are read from an array at one time, and can be sequentially output, one set after the other. For example, in an eight bit pre-fetch architecture, for each data output, eight bits are read from a memory bank, and will be available to be output. In other words, in case of FIG. 1 (which includes an 8 bit pre-fetch), the read operation will initially retrieve 128 bits of data. This data can then be output in eight sets of 16 bits. Pre-fetch architectures can be particularly advantageous for xe2x80x9cburstxe2x80x9d mode RAMs. In a burst mode RAM a sequence of addresses are accessed by the application of single address. By utilizing a pre-fetch architecture, all bits required for the burst sequence are available with one read operation, obviating the need to address a memory bank a multiple number of times.
Because the RAM 100 of FIG. 1 is a pre-fetch architecture, the shared data bus 128 is larger than I/O bus 132 by a multiple equivalent to the size of the pre-fetch. For example, if the I/O bus 132 was 16 bits wide, and the RAM 100 allowed for an eight bit pre-fetch, the shared data bus 128 would be 128 bits wide. In addition, there would be an eight bit latch circuit associated with data I/O to store the eight pre-fetched bits. Data would be sequentially output from the latches in response to a number of SHFTCLK signals.
Pre-fetch architecture can also be used in increase the speed and efficiency with which data is written into a memory bank. For example, each data I/O could include eight data input latches. In a write operation, for each data I/O, eight data bits could then be sequentially entered. Once all of the data input latches contain data, a single internal write function can simultaneously write all latched data bits. For example, in the RAM 100 of FIG. 1, the write circuit 124 could include 128 latches. Eight sets of 16 bits could be sequentially entered into the latches, and then written along the 128 shared data lines into memory banks.
An example of a write operation for one variation of the RAM 100 is illustrated in FIG. 2. FIG. 2 illustrates a conventional xe2x80x9cnon-postedxe2x80x9d write operation followed by a read operation. The term non-posted is used to distinguish the write operation from a xe2x80x9cpostedxe2x80x9d write, which will described in more detail below. FIG. 2 includes the clock signal CLK, a column address strobe signal CAS_, a write signal W_, a bank select indication BANK SEL, and a description of the type of data on the data I/Os (DATA).
Referring now to FIG. 1 in conjunction with FIG. 2, it is noted that the CAS_ and W_ signals are received at the command decoder 120 on the command bus 122. It is further noted that the example of FIG. 2 illustrates a case of a four bit pre-fetch (as opposed to an eight bit pre-fetch) in order to not generate overly large illustration. That is, in the case of the write operation, on each data I/O, a different data bit will be entered on four successive clock cycles, after which, all four bits will be written in parallel to a memory bank. Similarly, in the case of the read operation, four data bits will be read in parallel for each I/O, and then sequentially output, one by one, on successive clock cycles. It is understood that an eight bit pre-fetch architecture could include eight clock cycles to enter data and read data.
It is also important to note that all the data read by a pre-fetch operation does not have to be provided. If only one set of the pre-fetched data bits is to be read, the SHFTCLK signals will couple the appropriate set of bits from the multiple sets of bits provided by the pre-fetch. This same aspect of pre-fetch functions also applies to write operations.
At time t0, the CAS_ signal goes low on the rising edge of a CLK signal, initiating the beginning of a memory bank column access operation. It is understood that prior to time t0, a row address strobe signal RAS_ (also received on the command bus 122) will have previously transitioned low, resulting in the row buffer 108 receiving a row address on multiplexed bus 118. In response to the row address, row address information is provided to the row decoders 104a-104n). The row decoders 104a-104n), in turn, will select a row within at least one of the memory banks (102a-102n).
Also at time t0, the W_ signal will go low, indicating that the column access operation is a write operation. At the start of the write operation, the STORE signal will be activated, and a column address on the multiplexed bus 118 will be latched within the address latch 116. Following the start of the write operation, the system in which the RAM 100 is operating will provide input data at each I/O on the four successive clock cycles following time t0.
At time t1, the last of the input data is provided at the data I/Os 134. At this point, the xe2x80x9cinternalxe2x80x9d write function takes place. That is, while the input data may be stored in latches on the periphery of the RAM 100, the data still needs to be written into at least one of the memory banks (102a-102n). Thus, at time t1, the WRITE signal will be activated. With the WRITE signal active, the latched column address from the address latch 116 is coupled to the column decoders (106a-106n), which will provide a path between the shared data bus 128 and one of the memory banks (102a-102n). In the particular example of FIG. 2, bank 0102a receives the input data.
The non-posted write operation in FIG. 2 is immediately followed by a read operation. However, because the shared data bus 128 is needed to write latched data from the write circuit 124 to memory bank 0102a, the read operation may not occur while the internal write unction occurs. Thus, in order for a read operation to take place, the shared data bus 128 must be cleared of the input data being written into a memory bank, so that output data may flow from the memory banks (102a-102n) to the read circuit 126.
At time t2, data have been successfully written into memory bank 0102a, and the subsequent read operation is initiated by the CAS_ signal transitioning low. At the same time, a second column address is provided on the multiplexed address bus 118. In the particular example of FIG. 2, the second column address accesses memory bank 1102b. Because the W_ signal is high at time t2, the CAS_ signal results in a read operation. At the start of the read operation, the READ signal issued by the command decoder 120 goes high, and the column address from multiplexed address bus 118 (as opposed to the latched address stored within address latch 116) is coupled to the column decoders (106a-106n) by column buffer 112. There is some delay (referred to as xe2x80x9clatencyxe2x80x9d) between the initiation of the read operation, and the actual appearance of data at the data I/Os 134. Thus, the data accessed by the read operation started at time t2, will begin to appear at the data I/Os 134 at time t3.
At time t4, the internal read operation is completed. At time t5, the last of the pre-fetched data bits are output at the data I/Os.
It is noted that in the non-posted write/read combination of FIG. 1, between times t0 and t1, the system bus is active as input data bits are being provided to the RAM 100. Along these same lines, the system bus is also active between times t3 and t5, as output data bits are being provided by the RAM 100. However, the system bus is idle between times t1 and t2, as it must wait for the RAM 100 to execute the internal write function before starting the following read operation. This introduces a timing xe2x80x9cgapxe2x80x9d between the back-to-back non-posted write/read combination, reducing the bandwidth of the RAM 100.
Referring now to FIG. 3, a second type of write/read operation is illustrated. FIG. 3 illustrates a xe2x80x9cpostedxe2x80x9d write operation followed by a read operation. A posted write operation receives and stores input data, and rather than immediately writing the data to memory banks, allows for the internal write function to be executed at a later, more convenient time. FIG. 3 includes the same signals as FIG. 2, the CLK signal, the CAS_ signal, the W_ signal, the BANK SEL indication, and the response of the data I/Os (DATA). FIG. 3 also illustrates a four bit pre-fetch operation.
Referring now to FIG. 3 in conjunction with FIG. 1, between times t0 and t1, the posted write operation takes place in the same manner as the non-posted write operation described in conjunction with FIG. 2. A column address is stored in the address latch 116 by an active STORE signal.
The posted write operation of FIG. 3 deviates from the non-posted example of FIG. 2 in that the internal write function does not occur at time t1. Instead, the RAM 100 is available for immediately executing the subsequent read operation. Thus, at time t1, the WRITE signal is not active, and the internal write function does not take place. Further, with the input data now stored in latches located within the write circuit 124, the shared data bus 128 is free, and the CAS_ signal transitions low to immediately initiate the read operation. As in the case of FIG. 2, in the read operation, a second column address is coupled to the column decoders (106a-106n) by column buffer 112, and due to latency, the output data begins to appear on the data I/Os at time t2.
At time t3, the internal read operation is completed. At time t4, the last of the pre-fetched data bits are output at the data I/Os.
At time t4, it is assumed that the system bus is not active. Absent any further command bus 122 activity, the WRITE signal goes active, initiating the internal write function. The address stored within the address latch 116 is coupled to the column decoders (106a-106n), and the input data stored within the write circuit 124 is coupled to the appropriate memory bank (102a-102n). The internal write function is complete at time t5.
It is noted that the posted write/read combination illustrated by FIG. 3, results in the possibility of xe2x80x9cgaplessxe2x80x9d read operations following a write operation. That is, by utilizing a posted write, the timing gap required for the internal write operation in non-posted writes (shown between times t1 and t2 in FIG. 2) can be eliminated. This increases the bandwidth of the RAM 100 over that of the non-posted write case of FIG. 2. Of course, the internal write function itself is not eliminated, but simply postponed to a more advantageous time. Thus, it must be kept in mind that in the posted write/read case of FIG. 3, some time subsequent to the read operation (shown as time t4-t5 in FIG. 3) must be provided to complete the write operation.
Despite the advantages provided by the posted write/read operations described above, the overall desire to allow computing devices to operate at as fast a speed as possible continues to be a primary motivating factor in the design of computing system components. Accordingly, any further increase in RAM bandwidth, above and beyond the examples set forth above, would further advance the art toward this important goal.
According to the preferred embodiment, a random access memory (RAM) includes a plurality of memory banks. Data within each memory bank is coupled to an associated local read/write circuit by a shared local input/output (I/O) bus. The shared local I/O bus is coupled to a global bus that is separated into global read bus and a global write bus. The read/write circuits include input data latches for storing data provided on the global write bus.
When a write operation to one memory bank is followed by a read operation to another memory bank, the local read/write circuits enable the internal write function to take place along the global write bus, at the same time data is being read from the global read bus. This capability increases the bandwidth of the RAM, as most write-followed-by-read operations, are not only gapless, but do not require a later period of idle system bus time to complete the posted write operation.
According to another aspect of the preferred embodiment, when a write operation is followed by a read operation to the same memory address in the same memory bank, the write function and read operation can be performed simultaneously by the associated local read/write circuit.
According to another aspect of the preferred embodiment, the RAM is a synchronous RAM.
According to another aspect of the preferred embodiment, the RAM has a pre-fetch architecture.
According to another aspect of the preferred embodiment, the RAM is a dynamic RAM, and includes a multiplexed address bus.