1. Field of the Invention
The present invention relates to a DRAM-based electronic computer memory system utilizing an ultra-high-density DRAM technique and more specifically to upgrading the data transfer rate in a DRAM.
2. Background Art
In a system using an ultra-high-density DRAM, it is expected that the whole memory system will be integrated onto a single or a few (on the order of two or three) chips. This is because, in relatively small-sized computers such as personal computers, increases in density resulting from advances in DRAM integration techniques are ahead of increases in the capacity of main memory. For example, a main memory capacity of around 8 MB is needed for a personal computer but 64 MB of DRAM can be implemented with a single chip. In addition, there is a high possibility of 256 MB DRAMs being put to practical use in the near future.
Under such circumstances, rather than (1) incorporating a DRAM interface into the CPU, (2) integrating the DRAM control logic together with the DRAM array and connecting the integrated chip directly to a CPU bus would improve cost performance. When a great number of DRAM chips are needed to configure a main memory, a scheme for integrating DRAM control logic together with the DRAM array onto a single chip is extremely expensive, since a large integrated area is used for DRAM control logic. The reason for this is that the control logic must be integrated onto each DRAM chip and consequently the total number of DRAM control logics in the memory system increases. However, if the main memory can be composed of a single DRAM chip as is mentioned above, it is only necessary to integrate a single DRAM control logic onto one and the same chip, so that an increase in the chip area is not as critical. In brief, at present, the scheme (1) for incorporating a DRAM interface into the CPU is employed because the scheme (2) for integrating a DRAM control logic together with a great number of DRAM arrays leads to an increase in chip size and package cost and moreover complicates the testing of products. However, as the number of DRAMs used in relatively small-sized personal computers decreases as a consequence of the higher density in recent DRAM integration, the cost problem caused by such a factor is being solved. What is also of importance is that employing the scheme (2) for integrating the respective DRAM control logic together with DRAM arrays into a single chip makes it possible to bring about a great advantage in the performance of memories from the following two points of view.
Firstly, in techniques such as synchronous DRAM and burst EDO (Extended Data Out) for the interface of conventional DRAMs, stress is laid on speeding-up the clock. Accordingly, the scheme (1) cannot always be satisfactory in making use of bus cycles without waste.
Secondly, optimization of a critical path is more easily achieved between the DRAM array and DRAM control logic than between the CPU and the DRAM control logic. In other words, it requires longer time and more elaboration to optimize a critical path between the CPU and the DRAM control logic. For example, overhead due to the multiplexing of addresses cannot be avoided. Moreover, the speedier operation is, the more difficult it becomes to control skew in clock signals transferred between the chips.
Conventionally, a scheme for the provision of an external cache has been employed to upgrade the data transfer rate of DRAMs. However, the bus utilization ratio of DRAM has improved to equal the bus utilization ratio of SRAM in a cache memory, and moreover if the lead-off cycle (period of time from the initiation of RAS to the initiation of CAS) is shortened by integrating the DRAM control logic together with the DRAM array, a transfer rate comparable to that obtained when an external cache is provided can be implemented. Thus, the merit of attaching an off-chip external cache memory loses merit. What is worse, since connection of an external cache necessarily generates an overhead due to the data transfer between the external cache and the DRAM, performance may actually be lowered due to this cause. This deterioration of performance becomes conspicuous in a multimedia-type of application in which very great quantities of data must be transferred at high speed and therefore a solution using attachment of an external cache has the danger of worsening performance in addition to the increase in cost originating from the external cache.
A critical problem in the operational performance of DRAM in the conventional technique is the low speed of the data transfer rate in the path of row access. Speeding-up of the data transfer rate in column access has been fully studied e.g., in techniques such as synchronous DRAMs. However, research has not made as much advance in the data transfer rate for row access as in the data transfer rate for column access. That is, with the increasing data transfer rate in column access, the relatively slower row access is becoming the critical path in 4-beat burst mode operation. In fact, since the page miss ratio between consecutive burst actions (probability of necessary data being present in the same row in the next access) can be as much as 50%, row access takes place fairly frequently.
As a solution to this problem, it is possible to raise the data transfer rate by an appropriate pipelined operation in the interface between the CPU-DRAM or in the row access process for the DRAM (e.g., the address pipeline of the CPU). In a DRAM, individual steps of sensing, writing back and precharging must be accomplished in that order. Accordingly, the array time constant (the total time taken for sensing, writing back and precharging) becomes the time required for the data transfer of the DRAM. In data transfer between DRAMs, not only the time for the operational steps of sensing, writing back and precharging, but also the time for selecting row addresses and column addresses is necessary. Pipelining enables the time taken for selection of these addresses to be concealed behind the operational steps of sensing, writing back and precharging. Thus, the period of time for accessing two successive row addresses (the time taken for selection of the row address + the time taken for sensing, writing back and precharging, referred to as the RAS cycle hereinafter) can be made close to the time taken for sensing, writing back and precharging (referred to as the array time constant hereinafter). Certain conventional techniques disclose pipelined DRAMs having a row access path. In none of these conventional techniques, however, does the RAS cycles reach the array time constant.
The points mentioned above, especially the high integration of DRAM chips owing to the advance in LSI technique and the flow of the background art will be outlined referring to FIGS. 1-3. FIG. 1 shows an ordinary memory system using DRAM. This memory system is divided into numerous DRAM chips 10. This is because the integrated density of the current DRAM chip is too small for the capacity required for uses of the main memory so that a single DRAM chip cannot constitute the main memory. To control the plurality of DRAM chips, an off-chip DRAM controller 11 is required between the CPU (not shown) and the DRAM chips 10. As shown in FIG. 1, a DRAM controller 11 receives a memory access request from the CPU and supplies a row address and column address. Data outputted from a DRAM chip 10 is transferred through the DRAM controller 11 to the CPU. Data inputted to a DRAM chip 10 is processed similarly. However, according to such an off-chip scheme, the signal path between the controller 11 and a DRAM chip 10 lengthens and accordingly it is difficult to synchronize the addressing with the other operations of the DRAM with control of the delay of the control signal for RAS, CAS and the like. This difficulty becomes conspicuous especially in the case of high-speed data transfer.
Advances in the high-density integration techniques for DRAM enable the memory capacity required for a relatively small-sized computer to be almost satisfied with one or at most a few DRAM chips. As a result, the DRAM array 10 (corresponding to the DRAM chips in FIG. 1) can be integrated onto the same chip 1 as that of the DRAM controller 11, as shown in FIG. 2. In other respects, FIG. 2 is similar to FIG. 1. The technique relevant to FIG. 2, in which both are integrated on one and the same chip 1, is expected to upgrade operational performance and cost-saving somewhat, as compared with that of FIG. 1, in that one level of packaging is omitted. However, the upgrade of performance does not differ greatly from that obtained when the DRAM controller 11 is incorporated into the CPU and consequently has no technically significant effect.
FIG. 3 shows a DRAM configuration using pipelining of synchronous DRAMs or the like. A DRAM chip 10 is characterized by having a DRAM pipeline 12 formed inside it. However, this DRAM pipeline 12 is controlled by an external RAS (row address strobe) and CAS (column address strobe). Thus, the control of pipelines inside individual DRAMs becomes restrictive.
Several techniques other than pipelining have so far been proposed to upgrade the operational speed of DRAM.
Page Hit Scheme
The page hit scheme is a scheme to utilize the sense amplifier in a DRAM as a cache as well. The locality of data or an instruction in a page (here designating a row in the DRAM) is effectively utilized under a normal page mode in the DRAM. However, when an on-chip cache is connected to the CPU, the page hit ratio between two continuous cache line transfers is not so high. And the larger the capacity of the on-chip cache is, the lower the page hit ratio becomes. Thus, considering the delay for comparison of tags at the time of a page miss, the precharging time and the like, a significant upgrade in the operational speed of a DRAM cannot be expected even if the page hit scheme is employed.
Interleave Scheme
The interleave scheme is a scheme for dividing a memory into a plurality of banks and accessing these banks in sequence. This scheme is also often employed to raise the data transfer rate of DRAM. In the case of off-chip interleaving for a relatively small-sized computer, however, the granularity of memory (designating the minimum unit of additive memory installation) increases. For example, when a memory is divided into two banks, granularity doubles, which is a serious problem in an ultra-high-density DRAM. On the other hand, in the case of on-chip interleaving, the band-width can be increased without a great effect on the granularity of the memory. However, generally speaking, this technique is effective in upgrading the operational speed of DRAM only at the time of operations where different memory banks are alternately accessed. Thus, it lacks of generality as a scheme for upgrading the data transfer rate.