1. Field of the Invention
The present invention relates to a DRAM structure for reducing row latency for an irregular row access and for improving the effective bandwidth by varying a DRAM cell core structure. Specifically, the invention relates to a pipeline structure of a memory for a fast row cycle, which is different from a structure used in a conventional fast cycle RAM (FCRAM) and is established by modifying a cell core access in the channel structure of a virtual channel memory (VCM) that has been known to have its performance is improved when applied to a conventional structure and by introducing a row buffer and a latch to a decoder, to secure a sufficient bandwidth for sequential accesses and, simultaneously, to reduce a period of time required for a row path through application of a new technique to obtain short latency, thereby realizing a fast random row cycle.
2. Description of the Related Art
In general, the performance of a DRAM can be represented by a bandwidth and latency. The bandwidth means the amount of information that can be transmitted at a time. The bandwidth is proportioned to the frequency of asignal transmitted and the number of signal lines and it is determined by the column path of the DRAM. Accordingly, it is preferable that the DRAM has as high bandwidth as possible. For this, there have been proposed a variety of methods of making the DRAM column path short. Meanwhile, the latency means a period of time starting from input of a specific address for accessing the DRAM to output of data from the DRAM. The latency is determined mostly by the row path of the DRAM. The DRAM has better performance when the latency is smaller because it is preferable that the data is output fast.
When the DRAM used in a system is sequentially accessed, the maximum bandwidth of the DRAM can be sufficiently used in terms of the structural characteristic of the DRAM. In recent computer systems, however, there has been proposed is an unified memory architecture (UMA) which uses a system memory as a frame buffer for decreasing costs and there exist many masters directly accessing a main memory in a single system, such as cache controller, PCI controller, graphic controller, etc.
Furthermore, the form of memory access becomes irregular with an increase in the number of software developed in a language having a modularized form like C++. In this case, when the latency of the DRAM is longer, a period of time during which data is output for one access becomes longer, resulting in a remarkable reduction in the effective bandwidth. This means that the latency is very important element for telling the performance of the DRAM together with the bandwidth.
The latency of the DRAM is determined by the row path as described above. The row path is restricted by physical factors such as RC time constant of line, compared to the column path affecting the bandwidth so that it is difficult to reduce. Accordingly, there have been proposed various techniques for preventing the latency from being increased or for reducing the latency. These techniques include a multi-bank and access sequence controlling method, a method of using a temporary buffer, an address non-multiplexing and row path pipeline technique and a method of integrating an SRAM in a DRAM.
In the multi-bank and access sequence controlling method, when there are continuous accesses between different banks among multiple banks, page miss penalty is reduced through bank interleaving and the DRAM access sequence is controlled to improve the effective bandwidth. However, this method does not decrease the latency by reducing the time in the row path but puts many banks inside the DRAM to overlap DRAM access operations to thereby obtain an effect equivalent to a reduction in the latency. Accordingly, though the latency is decreased when accesses occur for different banks, the conventional latency is reflected on the output when sequential accesses are required for the same bank. In addition, the multiple inner banks deteriorate noise characteristic.
Furthermore, the aforementioned method of using a temporary buffer is to change the structure of a cell core to directly reduce the time required for the row path, to thereby decrease the latency. This method employs a temporary data buffer to a sense amplifier. Specifically, this technique uses the temporary data buffer to minimize the signal amplitude of a bit line to thereby reduce a pre-charge time and data detection time, and introduces a pipeline concept to the row path to realize the cycle time of 10 nsec for an irregular row access. However, the method of using the temporary buffer is still a conceptual idea so that techniques for operating it are not prepared yet. In addition, this method cannot obtain fast row latency when applied to an actual system because it has problems of restoring data stored in the temporary buffer to the cell core and complexity of a controller for controlling the restoration.
The address non-multiplexing and row path pipeline method is to introduce the pipeline concept to the row path and adopts address non-multiplexing to realize the row cycle of 20 ns. A fast cycle RAM (FCRAM) employs this method. The address non-multiplexing and row path pipeline method selects a sub-wordline structure, reduces the sub-block size of the cell core to decrease the load of a driver driving the cell core and shortens the period of time required for the row path by using a direct sensing method. However, this technique makes the sub-block size very small in order to reduce the cell core access time and adds many additional circuits, resulting in an increase in the area by 30-40%, compared to other DRAM structures having the same integrity. Furthermore, it uses the non-multiplexing method as an address input method so that it cannot be compatible with existing systems in terms of interface, for example, the number of pins for address input. Moreover, since its data outputting method is different from the conventional DRAM structure, an additional interface circuit is required to the data output port in order to apply this method to a system currently being used.
Finally, the method of integrating an SRAM in a DRAM, which is different from the above-mentioned three methods, is to integrate the SRAM in the DRAM to secure a sufficient bandwidth even for an irregular row access. This basically uses temporal and spatial locality of a memory access pattern. Specifically, the SRAM integrated together with the DRAM is used as a data buffer and the operations of the cell cores of the SRAM and DRAM are separated from each other to be capable of operating simultaneously, thereby reducing the latency due to page miss and improving the effective bandwidth. An enhanced synchronous DRAM (ESDRAM), cache DRAM (CDRAM), wide CDRAM and virtual channel memory (VCM) employ this method. There is described below the technical background and features of the VCM structure that has been known to have its performance improved when applied to the conventional structure for the purpose of securing a sufficient bandwidth for sequential accesses and reducing the time required for the row path through application of a new technique to obtain a short latency.
The VCM, proposed by NEC Co. of Japan, is constructed in a manner that an SRAM buffer is integrated in a DRAM, similar to the ESDRAM and CDRAM structures, to increase the effective bandwidth using the SRAM buffer. The integrated SRAM buffer is called xe2x80x9cchannelxe2x80x9d, and the operation of the DRAM cell core and the operation of the SRAM buffer are separated from each other through this channel to conceal the pre-charge time due to page miss as in the ESDRAM and CDRAM. However, the VCM controls data transmission between the SRAM and DRAM through an external control while the ESDRAM and CDRAM perform a control for data transmission and consistency maintenance between the SRAM and DRAM cores using a controller integrated together with the SRAM in the DRAM. The VCM has relatively simple structure because a logic part for the control is not included in the DRAM. Thus, it is known that the increase in the area of the VCM caused by the integration of SRAM is 3% approximately. When the control can be appropriately executed externally, the VCM can optimize the operations of the SRAM and DRAM and data transmission between them to conceal the most part of the latency due to page miss. Moreover, each memory master is assigned a row data buffer of its own to control the row data buffer independently so that the VCM can appropriately correspond to a system in which multiple memory masters exist.
FIG. 1 shows the configuration of a conventional VCM. The VCM (including ESDRAM and CDRAM) does not shorten the row path time basically but separates the operation of the cell core and the operation of the channel (SRAM) from each other through the SRAM to overlap them and allows multiple accesses to be processed fast by the channel (SRAM), to obtain an effect equivalent to a reduction in the latency. When a write miss occurs in the channel (that is, when desired data does not exist on sixteen channels), though there is a method of selecting one of the sixteen channels, eliminating the data therein and reading new data, the VCM shown in FIG. 1 does not access the channel as far as possible but uses a dummy channel for transmitting data to the cell core instantly. The dummy channel is used in xe2x80x9cread modified writexe2x80x9d mode that a segment required for the cell core is read, data to be written is recorded and then the data is written to the cell core. When there are sequential write misses or two write misses occur for the same segment, data is read into the dummy channel for the first write miss, the second write miss data is written and then corresponding segment is modified-written to the cell. When write misses occur for different segments, it is required to read a segment for each write miss and write it.
Accordingly, since the VCM separates the operation of the cell core and the operation of the channel (SRAM) from each other to overlap them and allows multiple accesses to be processed fast by the channel (SRAM) to obtain the effect equivalent to a reduction in the latency, when sequential accesses of the DRAM cell core is needed due to continuous channel misses, the conventional long latency appears on the output. Furthermore, because data is not output from the cell core directly but output through the channel, the latency may become longer than that of the conventional structures. That is, the entire performance is restricted by the background operation accessing the cell core.
When write misses occur continuously for different segments, especially, since the VCM processes them through the dummy channel, it should access the cell core twice for is one write miss. Moreover, the VCM cannot perform other background operations during access of the dummy channel because of its structure. Thus, the latency becomes long considerably in case where the background operation is required to continue due to continuous write misses. While the VCM activates the entire rows for processing a single segment, though there is no problem when other three segments other than a segment being used are used for the next access, other segments activated together with the segment being used are not used because of characteristics of memory access. This consumes power unnecessarily.
It is, therefore, an object of the present invention to provide a pipeline structure of a memory for fast row cycle, which is different from the pipeline structure used in the conventional FCRAM and is obtained by varying the cell core access method in the conventional VCM channel structure and introducing a row buffer and a latch to a decoder, to make even a random row cycle fast.
To accomplish the object of the present invention, there is provided a memory having a pipeline structure in a row path, which has a memory cell array in which a plurality of memory cell cores capable of storing logic states of electric signals are arranged in N columns and M rows and is able to perform an operation of reading or writing data stored in a corresponding cell core by enabling the addresses and bit lines of each of the memory cell cores arranged in columns or rows, in which the address lines of the memory cell array are grouped by a predetermined number, a main address line representative of each address line group and the address lines forming each group are constructed of sub-address lines of corresponding main address line, address data is received from a specific control system to access the main address line, and the sub address lines included in the accessed main address line are selected.
To accomplish the object of the invention, there is also provided a memory having a pipeline structure in a row path, which has a memory cell array in which a plurality of memory cell cores capable of storing logic states of electric signals are arranged in N columns and M rows and is able to perform an operation of reading or writing data stored in a corresponding cell core by enabling the addresses and bit lines of each of the memory cell cores arranged in columns or rows, in which the address lines of the memory cell array are grouped by a predetermined number, a main address line representative of each address line group and the address lines forming each group are constructed of sub-address lines of corresponding main address line, the memory including: a buffer for receiving address decoding data from a specific control system to temporarily store it; row detection means for detecting an address row requested by the control system from the address decoding data output from the buffer; a main address line driver for driving a main address line corresponding to data with respect to the address row requested by the control system; and a latch, located on a signal transmission path between the row detection means and the main address line driver, for separating address decoding and cell core access operations from each other.
To accomplish the object of the invention, there is provided a memory having a pipeline structure in a row path, which has a memory cell array in which a plurality of memory cell cores capable of storing logic states of electric signals are arranged in N columns and M rows and is able to perform an operation of reading or writing data stored in a corresponding cell core by enabling the addresses and bit lines of each of the memory cell cores arranged in columns or rows, in which the address lines of the memory cell array are grouped by a predetermined number, a main address line representative of each address line group and the address lines forming each group are constructed of sub-address lines of corresponding main address line, the memory including: a buffer for receiving address decoding data from a specific control system to temporarily store it; a row detection means for detecting an address row requested by the control system from the address decoding data output from the buffer; a main address line driver for driving a main address line corresponding to data with respect to the address row requested by the control system; a latch, located on a signal transmission path between the row detection means and the main address line driver, for separating address decoding and cell core access operations from each other; and a row buffer placed between a sense amplifier connected to a data bit line of each cell core constructing the memory cell array and a data input/output channel, the row buffer separating data detection and data transmission operations from each other.