1. Field of the Invention
The present invention relates to a semiconductor memory device, and more specifically to a semiconductor memory device containing a cache in which a dynamic random access memory (DRAM) having a large storage capacity serving as a main memory and a static random access memory (SRAM) having small storage capacity serving as a cache memory are integrated on the same semiconductor chip.
2. Description of the Background Art
Operation speed of recent 16-bit or 32-bit microprocessing unit (MPU) has been so much increased as to have operation clock frequency as high as 25 MHz or higher. In a data processing system, a standard DRAM (Dynamic Random Access Memory) is often used as a main memory having large storage capacity, since cost per bit is low. Although access time in the standard DRAM has been reduced, the speed of operation of the MPU has been increased much faster than that of the standard DRAM. Consequently, in a data processing system using the standard DRAM as a main memory, increase of wait state is inevitable. The gap in speed of operation between MPU and the standard DRAM is inherent to the standard DRAM which has the following characteristics.
(1) A row address and a column address are time divisionally multiplexed and applied to the same address pin terminal. The row address is taken in the device at a falling edge of a row address strobe signal (/RAS). The column address is taken in the device at a falling edge of a column address strobe signal (/CAS). The row address strobe signal /RAS defines start of a memory cycle and activates a row selecting system. The column address strobe signal /CAS activates a column selecting system. Since a prescribed time period called "RAS-CAS delay time (tRCD) is necessary from the time the signal /RAS is set to an active state to the time the signal /CAS is set to the active state, there is a limit in reducing the access time, namely, there is a limit derived from address multiplexing.
(2) When the row address strobe signal /RAS is once raised to set the DRAM to a standby state, the row address strobe signal /RAS cannot fall to "L" again until a time period called a RAS precharge time (tTR) has lapsed. The RAS precharge time is necessary to insure precharging various signal lines in the RAM to prescribed potentials. Due to the RAS precharge time tRP, the cycle time of the DRAM cannot be reduced. In addition, when the cycle time of the DRAM is reduced, the number of charging/discharging of signal lines in the DRAM is increased, which increases current consumption.
(3) The higher speed of operation of the DRAM can be realized by circuit technique such as improvement of layout, increase of degree of integration of circuits, development in process technique and by applicational improvement such as improvement in the methods of driving. However, the speed of operation of the MPU is increased at much faster rate than DRAM. The speed of operation of semiconductor memories is hierarchical. For example, there are high speed bipolar RAMs using bipolar transistors such as ECLRAMs (Emitter Coupled Logic RAMs) and Static RAM, and comparatively low speed DRAMs using MOS transistors (insulated gate type field effect transistors). It is very difficult to expect the operation speed (cycle time) as fast as several tens ns (nano second) in a standard DRAM formed of MOS transistors.
There have been various applicational improvements to decrease the gap between speed of operations of the MPU and the standard DRAM. Such improvements mainly comprises the following two approaches.
(1) Use of high speed mode of the DRAM and interleave method
(2) External provision of a high speed cache memory (SRAM).
The first approach (1) includes a method of using a high speed mode such as a static column mode or a page mode, and a method of combining the high speed mode and the interleave method. In the static mode, one word line (one row) is selected, and thereafter only the column address is changed successively, to successively access memory cells of this row. In the page mode, one word line is selected, and then column addresses are successively taken by toggling the signal /CAS to successively access memory cells connected to the selected one word line. In either of these modes, memory cells can be accessed without toggling the signal /RAS, enabling higher speed than the normal access using the signals /RAS and /CAS.
In the interleave method, a plurality of memories are provided in parallel to a data bus, and by alternately or successively accessing the plurality of memories, the access time is reduced in effect. The use of high speed mode of the DRAM and combination of the high speed mode and the interleave method have been known as methods of using the standard DRAM as a high speed DRAM in a simple and relatively effective manner.
The second approach (2) has been widely used in main frames. A high speed cache memory is expensive. However, in the field of personal computers in which high performance as well as low cost are desired, this approach is employed in some parts with a sacrifice of cost. There are three possible ways to provide the high speed cache memory. Namely,
(a) the high speed cache memory is contained in the MPU itself; PA1 (b) the high speed cache memory is provided outside the MPU; and PA1 (c) the high speed cache memory is not separately provided but the high speed mode contained in the standard DRAM is used as a cache (the high speed mode is used as a pseudo cache memory). When a cache hit occurs, the standard DRAM is accessed in the high speed mode, and at the time of a cache miss, the standard DRAM is accessed in the normal mode. The above mentioned three ways (a) to (c) have been employed in the data processing systems in some way or other.
In most MPU systems, the memories are adopted to have bank structure and interleaving is carried out on bank by bank basis in order to conceal the RAS precharge time (TRP) which is inevitable in the DRAM, in view of cost. By this method, the cycle time of the DRAM can be substantially one half that of specification value. The method of interleave is effective only when memories are sequentially accessed. When the same memory bank is to be continuously accessed, it is ineffective. Further, substantial improvement of the access time of the DRAM itself cannot be realized. The minimum unit of the memory must be at least 2 banks.
When the high speed mode such as the page mode or the static column mode is used, the access time can be reduced effectively only when the MPU successively accesses a certain page (data of a designated one row). This method is effective to some extent when the number of banks is comparatively large, for example 2 to 4, since different rows can be accessed in different banks. When the data of the memory requested by the MPU does not exist in the given page, it is called a "miss hit". Normally, a group of data are stored in adjacent addresses or sequential addresses. In the high speed mode, a row address, which is one half of the addresses, has been already designated, and therefore possibility of "miss hit" is high. When the number of banks becomes as large as 30 to 40, data of different pages can be stored in different banks, and therefore the "miss hit" rate is remarkably reduced. However, it is not practical to provide 30 to 40 banks in a data processing system. In addition, if a "miss hit" occurs, the signal (/RAS) is raised and the DRAM must be returned to the precharge cycle in order to re-select the row address, which sacrifices the characteristic of the bank structure.
In the above described second method (2), a high speed cache memory is provided between the MPU and the standard DRAM. In this case, the standard DRAM may have relatively low speed of operation. Standard DRAMs having storage capacities as large as 4M bit or 16M bits have come to be used. In a small system such as a personal computer, the main memory thereof can be formed by one or several chips of standard DRAMs. External provision of the high speed cache memory is not very effective in such a small system in which the main memory can be formed of one standard DRAM. If the standard DRAM is used as the main memory, the data transfer speed between the high speed cache memory and the main memory is limited by the number of data input/output terminals of the standard DRAM, which constitutes a bottleneck in increasing the speed of the system.
When the high speed mode is used as a pseudo cache memory, the speed of operation is lower than the high speed cache memory, and it is difficult to realize the desired system performance.
Provision of the high speed cache memory (SRAM) in the DRAM is proposed as a method of forming a relatively inexpensive and small system, which can solve the problem of sacrifice of system performance when the interleave method or the high speed operation mode is used. More specifically, a single chip memory having a hierarchical structure of a DRAM serving as a main memory and a SRAM serving as a cache memory has been conceived. The 1-chip memory having such a hierarchical structure is called a cache DRAM (CDRAM). The CDRAM will be described.
FIG. 1 shows a structure of a main portion of a conventional standard 1 megabit DRAM. As shown in FIG. 1, the DRAM comprises a memory cell array 500 including a plurality of memory cells MC arranged in a matrix of rows and columns. A row of memory cells are connected to one word line WL. A column of memory cells MC are connected to one column line CL. Normally, the column line CL is formed by a pair of bit lines. A memory cell MC is positioned at a crossing of one of the pair of bit lines and one word line WL. In a 1M DRAM, the memory cells MC are arranged in a matrix of 1024.times.1024 columns. Namely, the memory cell array 500 includes 1024 word lines WLs and 1024 column lines CLs (1024 pairs of bit lines).
The DRAM further comprises a row decoder 502 which decodes an externally applied row address (not shown) for selecting a corresponding row of the memory cell array 500; a sense amplifier which detects and amplifies data of the memory cell connected to the word line selected by the row decoder 502; and a column decoder which decodes an externally applied column address (not shown) for selecting a corresponding column of the memory cell array 502. In FIG. 1, the sense amplifier and the column decoder are denoted by one block 504. If the DRAM has an x1 bit structure in which input/output of data is effected bit by bit, one column line CL (bit line pair) is selected by the column decoder. If the DRAM has an x4 bit structure in which input/output of data is effected 4 bits by 4 bits, 4 column lines CL are selected by the column decoder. One sense amplifier is provided for each column line (bit line pair) CL in the block 504.
In memory access for writing data to or reading data from the memory cell MC in the DRAM, the following operation is carried out. First, a row address is applied to the row decoder 502. The row decoder 502 decodes the row address and raises the potential of one word line WL in the memory cell array 500 to "H". Data of the 1024 bits of memory cells MC connected to the selected word line WL are transmitted to a corresponding column line CL. The data on the column line CL are amplified by sense amplifiers included in the block 504. Selection of a memory cell to which the data is written or from which the data is read of the memory cells connected to the selected word line WL is carried out by a column selection signal from the column decoder included in the block 504.
In the above described high speed mode, column addresses are successively applied to the column decoder included in the block 504. In the static column mode operation, column addresses applied at every prescribed time interval are decoded as new column addresses by the column decoder, and the corresponding memory cell of the memory cells connected to the selected word line WL is selected by the column line CL. In the page mode, a new column address is applied at every toggling of the signal /CAS, and the column decoder decodes the column address to select the corresponding column line. In this manner, in the high speed mode, one row of memory cells MC connected to the selected word line WL can be accessed at high speed by setting one word line WL at a selected state and by changing the column addresses only.
FIG. 2 shows a general structure of a conventional 1M bit CDRAM. Referring to FIG. 2, the conventional CDRAM comprises, in addition to the elements of the standard DRAM shown in FIG. 1, SRAM 506 and a transfer gate 508 for transferring data between one row of the memory cell array 500 of the DRAM and the SRAM 506. The SRAM includes a cache register provided corresponding to each column line CL of the memory cell array 500 so as to enable simultaneous storage of data of one row of the DRAM memory cell array 500. Therefore, 1024 cache registers are provided. The cache register is formed by an SRAM cell. In the structure of the CDRAM shown in FIG. 2, when a signal representing a cache hit is externally applied, the SRAM 506 is accessed, enabling access to the memory at high speed. At the time of a cache miss (miss hit), the DRAM portion is accessed.
A CDRAM as described above having a DRAM of a large storage capacity and a high speed SRAM integrated on the same chip is disclosed in, for example, Japanese Patent Laid Open (Kokai) Nos. 60-7690 and 62-38590.
In the above described conventional CDRAM structure, column lines (bit line pairs) CL of the DRAM memory cell array 500 and column lines (bit line pairs) of the SRAM (cache memory) 506 are connected in one to one correspondence through a transfer gate 508. More specifically, in the above described conventional CDRAM structure, data of the memory cells connected to one word line WL in the DRAM memory cell array 500 and the data of the same number of SRAMs as one row of the memory cell array 500 are transferred bi-directionally and simultaneously, through the transfer gate 508. In this structure, the SRAM 506 is used as a cache memory and the DRAM is used as a main memory.
The so called block size of the cache is considered to be the number of bits (memory cells) the contents of which are rewritten in one data transfer in SRAM 506. Therefore, the block size is the same as the number of memory cells which are physically coupled to one word line WL of DRAM memory cell array 500. As shown in FIGS. 1 and 2, when 1024 memory cells are physically connected to one word line WL, the block size is 1024.
Generally, when the block size becomes larger, the hit ratio is increased. However, if the cache memory has the same size, the number of sets is reduced in inverse proportion to the block size, and therefore the hit ratio is decreased. For example, when the cache size is 4K bit and the block size 1024, the number of sets is 4. However, if the block size is 32, the number of sets is 128. Therefore, in the conventional CDRAM structure, the block size is made too large, and the cache hit ratio cannot be very much improved.
A structure enabling reduction in block size is disclosed in, for example, Japanese Patent Laid Open (Kokai) No. 1-146187. In this prior art, column lines (bit line pairs) of the DRAM array and the SRAM array are arranged in one to one correspondence, but they are divided into a plurality of blocks in the column direction. Selection of the block is carried out by a block decoder. At the time of a cache miss (miss hit), one block is selected by the block decoder. Data are transferred between only the selected DRAM block and the SRAM block. By this structure, the block size of the cache memory can be reduced to an appropriate size. However, there remains the following problem unsolved.
FIG. 3 shows a standard array structure of a 1M bit DRAM array. In FIG. 3, the DRAM array is divided into 8 memory blocks DMB1 to DMB8. A row decoder 502 is commonly provided for the memory blocks DMB1 to DMB8 on one side in the longitudinal direction of the memory array. For each of the memory blocks DMB1 to DMB8, (sense amplifier+column decoder) blocks 504-1 to 504-8 are provided.
Each of the memory blocks DMB1 to DMB8 has the capacity of 128K bits. In FIG. 3, one memory block DMB is shown to have 128 rows and 1024 columns, as an example. One column line CL includes a pair of bit lines BL, /BL.
As shown in FIG. 3, when the DRAM memory cell array is divided into a plurality of blocks, one bit line BL (and /BL) becomes shorter. In data reading, charges stored in a capacitor (memory cell capacitor) in the memory cell are transmitted to a corresponding bit line BL (or /BL). At this time the amount of potential change generated on the bit line BL (or /BL) is proportional to the ratio Cs/Cb of the capacitance Cs of the memory cell capacitor to the capacitance Cb of the bit line BL (or /BL). If the bit line BL (and /BL) is made shorter, the bit line capacitance Cb can be reduced. Therefore, the amount of potential change generated on the bit line can be increased.
In operation, sensing operation of the memory block (memory block DMB2 in FIG. 3) including the word line WL selected by the row decoder 502 is carried out only, and other blocks are kept in a standby state. Consequently, power consumption incidental to charging/discharging of the bit line during sensing operation can be reduced.
When the above described block dividing type CDRAM is applied to the DRAM shown in FIG. 3, a SRAM register and a block decoder must be provided for each of the memory blocks DMB1 to DMB8, which significantly increases the chip area.
Further, the bit lines of the DRAM array and the SRAM array are in one to one correspondence, as described above. When direct mapping method is employed as the method of mapping memories between the main memory and the cache memory, then the SRAM 50 is formed by 1024 cache registers arranged in one row, as shown in FIG. 2. In this case, the capacity of the SRAM cache is 1K bit.
When 4 way set associative method is employed as the mapping method, the SRAM array 506 includes 4 rows of cache registers 506a to 506d as shown in FIG. 4. One of the 4 rows of cache registers 506a to 506d is selected by the selector 510 in accordance with a way address. In this case, the capacity of the SRAM cache is 4K bits.
As described above, the method of memory cell mapping between the DRAM array and the cache memory is determined dependent on the structure in the chip. When the mapping method is to be changed, the cache size also must be changed.
In both of the CDRAM structures described above, the bit lines of the DRAM array and the SRAM array are in one to one correspondence. Therefore, the column address of the DRAM array is inevitably the same as the column address of the SRAM array. Therefore, full associative method in which memory cells of the DRAM array are mapped to an arbitrarily position of the SRAM array is impossible in principle.
Another structure of a semiconductor memory device in which the DRAM and the SRAM are integrated on the same chip is disclosed in Japanese Patent Laid Open (Kokai) No. 2-87392. In this prior art, the DRAM array and the SRAM array are connected through an internal common data bus. The internal common data bus is connected to an input/output buffer for inputting/outputting data to and from the outside of the device. The position of selection of the DRAM array and the SRAM array can be designated by separate addresses. However, in this structure of the prior art, data transfer between the DRAM array and the SRAM array is carried out by an internal common data bus, and therefore the number of bits which can be transferred at one time is limited by the number of internal data buses, which prevents high speed rewriting of the contents of the cache memory. Therefore, as in the above described structure in which the SRAM cache is provided outside the standard DRAM, the speed of data transfer between the DRAM array and the SRAM array becomes a bottleneck, preventing provision of a high speed cache memory system.
In this prior art, data are transferred between the DRAM array and the SRAM array through the internal common data bus. Therefore, an operation which is generally called "copy back mode" cannot be carried out at high speed. The "copy back mode" includes the step of transferring data of a corresponding memory cell in the SRAM array to the original memory cell position of the DRAM array at the time of cache miss, and the step of transferring the data of the DRAM memory cell to which an access is requested to a corresponding memory cell of the SRAM array. Although the internal common data bus is a bi-directional bus, the data transfer at one time is one way, namely, from SRAM to DRAM or from DRAM to SRAM. Therefore, in this structure of the prior art, a number of steps, that is, selecting a word line in the DRAM array, transferring data from the SRAM array to the DRAM array, precharging of the DRAM array (setting to the standby state), selecting of another word line of the DRAM array, and transferring data of a corresponding memory cell of the selected word line to the SRAM are necessary, and therefore "copy back" at high speed is impossible.
In this prior art, data are transferred between the DRAM array and the SRAM array through the internal common data bus. Therefore, at a time of a cache miss, access to the SRAM array to read data from the SRAM array cannot be done until data transfer from the DRAM array to the SRAM array is completed and the DRAM array is set to the standby state. Namely, at the time of a cache miss or the like, reading of data cannot be carried out at high speed.
In a general CDRAM, the DRAM must be refreshed. In the CDRAM in which access to the DRAM array and access the SRAM array cannot be done independently, the SRAM array cannot be accessed during refreshing of the DRAM array. Namely, during this period, the CPU cannot use the cache, and the performance of the cache system is not available.
In a conventional CDRAM, data output timing is determined by an external control signal (/CAS and /WE). At this time, before the establishment of output data, invalid data are output. Dependent on application, for example in a pipeline application, it is preferred that valid data only are always output. Accordingly, the conventional CDRAM has limited application, since the data output timing cannot be changed dependent on application. When it is to be applied to the pipeline processing, separate latch means and the like must be externally provided, which inevitably increases the scale of the cache system. In addition, if such a latch is externally provided and the latch operation is effected by a system clock, data output from the latch at one time must be the data of the previous cycle, in order to prevent latching of invalid data. Data accessed at present cycle cannot be read, which limits the application.