1. Field of the Invention
The present invention relates to a semiconductor memory device, and more specifically, to a semiconductor memory device having a main memory with a large storage capacity and a high speed cache memory with a small storage capacity integrated on the same chip. More specifically, the present invention relates to a semiconductor memory device containing a cache having a Dynamic Random Access Memory (DRAM) and a Static Random Access Memory (SRAM) integrated on the same chip.
2. Description of the Background Art
(i) Usage of standard DRAM as a main memory
Operation speed of recent microprocessing unit (MPU) has been so much increased as to have operation clock frequency as high as 25 MHz or higher. In a data processing system, a standard DRAM (Dynamic Random Access Memory) is often used as a main memory having large storage capacity, since cost per bit is low. Although access time in the standard DRAM has been reduced, the speed of operation of the MPU has been increased much faster than that of the standard DRAM. Consequently, in a data processing system using the standard DRAM as a main memory, increase of wait state is inevitable. The gap in speed of operation between MPU and the standard DRAM is inevitable because the standard DRAM has the following characteristics.
(1) A row address and a column address are time divisionally multiplexed and applied to the same address pin terminals. The row address is taken in the device at a falling edge of a row address strobe signal/RAS. The column address is taken in the device at a falling edge of a column address strobe signal/CAS. The row address strobe signal/RAS defines start of a memory cycle and activates row selecting circuitry. The column address strobe signal/CAS activates column selecting circuitry. Since a prescribed time period called "RAS-CAS delay time (tRCD)" is necessary from the time the signal/RAS is set to an active state to the time the signal/CAS is set to the active state, there is a limit in reducing the access time, namely, there is a limit derived from address multiplexing.
(2) When the row address strobe signal/RAS is once raised to set the DRAM to a standby state, the row address strobe signal/RAS cannot fall to "L" again until a time period called a RAS precharge time (tRP) has lapsed. The precharge time is necessary for surely precharging various signal lines in the DRAM to predetermined potentials. Due to the RAS precharge time tRP, the cycle time of DRAM cannot be reduced. In addition, when the cycle time of the DRAM is reduced, the number of charging/discharging of signal lines in the DRAM is increased, which increases current consumption.
(3) The higher speed of operation of the DRAM can be realized by circuit technique such as improvement of layout, increase of degree of integration of circuits, development in process technique and by applicational improvement such as improvement in the method of driving. However, the speed of operation of the MPU is increased at much faster rate than DRAM. The speed of operation of semiconductor memories is hierarchical. For example, there are high speed bipolar RAMs using bipolar transistors such as ECLRAMs (Emitter Coupled RAM) and Static RAM, and relatively low speed DRAMs using MOS transistors (insulated gate type field effect transistors). It is very difficult to expect the operation speed (cycle time) as fast as several tens ns (nano seconds) in a standard DRAM formed of MOS transistors.
There have been various applicational improvements to stop the gap between speed of operations of the MPU and the standard DRAM. Such improvements mainly comprise the following two approaches.
(1) Use of high speed mode of the DRAM and interleave method
(2) External provision of a high speed cache memory (SRAM).
The first approach (1) includes a method of using a high speed mode such as a static column mode or a page mode, and a method of combining the high speed mode and the interleave method. In the static column mode, one word line (one row) is selected, and thereafter only the column address is changed successively, to successively access memory cells of this row. In the page mode, one word line is selected, and then column addresses are successively taken by toggling the signal/CAS to successively access memory cells connected to the selected one word line. In either of these modes, memory cells can be accessed without toggling the signal/RAS, enabling higher speed accessing than the normal access using the signals/RAS and/CAS.
In the interleave method, a plurality of memories are provided in parallel to a data bus, and by alternately or successively accessing the plurality of memories, the access time is reduced in effect. The use of high speed mode of the DRAM and combination of the high speed mode and the interleave method have been known as a method of using the standard DRAM as a high speed DRAM in a simple and relatively effective manner.
The second approach (2) has been widely used in a main frame art. A high speed cache memory is expensive. However, in the field of personal computers in which high performance as well as low cost are desired, this approach is employed in some parts of the field with a sacrifice of cost. There are three possible ways to provide the high speed cache memory. Namely,
(a) the high speed cache memory is contained in the MPU itself; PA1 (b) the high speed cache memory is provided outside the MPU; and PA1 (c) the high speed cache memory is not separately provided but the high speed mode supported in the standard DRAM is used as a cache (the high speed mode is used as a pseudo cache memory). When a cache hit occurs, the standard DRAM is accessed in the high speed mode, and at the time of a cache miss, the standard DRAM is accessed in the normal mode.
The above mentioned three ways (a) to (c) have been employed in the data processing systems in some way or other. In most MPU systems, the memories are organized in a bank structure and interleaving is carried out on bank by bank basis in order to conceal the RAS precharge time (tRP) which is inevitable in the DRAM, in view of cost. By this method, the cycle time of the DRAM can be substantially one half that of the specification value.
The method of interleave is effective only when memories are sequentially accessed. When the same memory bank is to be continuously accessed, it is ineffective. Further, substantial improvement of the access time of the DRAM itself cannot be realized. The minimum unit of the memory must be at least 2 banks.
When the high speed mode such as the page mode or the static column mode is used, the access time can be reduced effectively only when the MPU successively accesses a certain page (data of a designated one row). This method is effective to some extent when the number of banks is comparatively large, for example 2 to 4, since different rows can be accessed in different banks. When the data of the memory requested by the MPU does not exist in the given page, it is called a "miss hit" (cache miss). Normally, a group of data are stored in adjacent addresses or sequential addresses. In the high speed mode, a row address, which is one half of the addresses, has been already designated, and therefore possibility of "miss hit" is high.
When the number of banks becomes as large as 30 to 40, data of different pages can be stored in different banks, and therefore the "miss hit" rate is remarkably reduced. However, it is not practical to provide 30 to 40 banks in a data processing system. In addition, if a "miss hit" occurs, the signal/RAS is raised and the DRAM must be returned to the precharge cycle in order to reselect the row address, which sacrifices the characteristic of the bank structure.
In the above described second method (2), a high speed cache memory is provided between the MPU and the standard DRAM. In this case, the standard DRAM may have relatively low speed of operation. Standard DRAMs having storage capacities as large as 4M bits or 16M bits have come to be used. In a small system such as a personal computer, the main memory thereof can be formed by one or several chips of standard DRAMs. External provision of the high speed cache memory is not so effective in such a small system in which the main memory can be formed of one standard DRAM. If the standard DRAM is used as the main memory, the data transfer speed between the high speed cache memory and the main memory is limited by the number of data input/output terminals of the standard DRAM, which constitutes a bottleneck in increasing the speed of the system.
When the high speed mode is used as a pseudo cache memory, the speed of operation thereof is slower than the high speed cache memory, and it is difficult to realize the desired system performance.
Provision of the high speed cache memory (SRAM) in the DRAM is proposed as a method of forming a relatively inexpensive and small system, which can solve the problem of sacrifice of system performance when the interleave method or the high speed operation mode is used. More specifically, a single chip memory having a hierarchical structure of a DRAM serving as a main memory and a SRAM serving as a cache memory has been conceived. The one-chip memory having such a hierarchical structure is called a cache DRAM (CDRAM).
Normally in a CDRAM, a DRAM and an SRAM are integrated on the same chip. At the time of a cache hit SRAM is accessed, while at the time of a cache miss, the DRAM is accessed. Namely, the SRAM operating at high speed is used as a cache memory and the DRAM having a large storage capacity is used as a main memory.
The so called block size of the cache is considered to be the number of bits the contents of which are rewritten in one data transfer in SRAM. Generally, when the block size becomes larger, the hit rate is increased. However, if the cache memory has the same size, the number of sets is reduced in inverse proportion to the block size, and therefore the hit rate is decreased. For example, when the cache size is 4K bits and the block size is 1024, the number of sets is 4. However, if the block size is 32, the number of sets is 128. Therefore, in the conventional CDRAM structure, the block size is made too large, and the cache hit rate cannot be very much improved. A structure enabling reduction in block size is disclosed in, for example, Japanese Patent Laying-Open No. 1-146187.
FIG. 217 shows the whole structure of the conventional CDRAM disclosed in the aforementioned laid-open application. Referring to FIG. 217, the conventional CDRAM includes a memory array 1 including a plurality of dynamic memory cells arranged in a matrix of rows and columns. Memory array 1 is divided into a plurality of memory blocks B#1 to B#4 each including a plurality of columns. Memory blocks B#1 to B#4 share word lines.
The conventional CDRAM further includes a row address buffer 2 taking externally applied address signals A0 to An as a row address signal RA in response to an external row address strobe signal /RAS and generating an internal row address signal; a column address buffer 4 taking address signals A0 to An as a column address signal CA in response to an external column address strobe signals /CAS for generating an internal column address signal; a row decoder 6 responsive to the internal row address signal from row address buffer 2 for generating a signal to select a corresponding row in memory cell array 1; a word driver 8 responsive to a row selecting signal from row decoder 6 for transmitting a driving signal to the selected row of memory cell array 1 to set a word line corresponding to the designated row to a selected state; a sense amplifier group 10 for sensing, amplifying and latching data of the memory cells connected to the selected row in memory cell array 1; a data register circuit 14 including a plurality of data registers provided corresponding to each column of the memory cell array 1; a transfer gate circuit 12 for transferring data between each column of memory cell array 1 and data register circuit 14; an IO gate 16 for decoding the internal column address signal from column address buffer 4 to select a corresponding column of memory cell array 1 or a corresponding data register in data register circuit 14; a block decoder 18 responsive to an externally applied cache hit/miss designating signal CH for selecting a corresponding block in memory cell array 1; an input buffer 24 and an output buffer 26 for inputting/outputting data from and to the outside of the device; a column decoder 20 for decoding the internal column address signal from column address buffer 4 for generating a signal for selecting and connecting the corresponding column of memory cell array 1 or the corresponding data register of data register circuit 14 through IO gate circuit 16 to input buffer 24 and output buffer 26; and a read/write control circuit 28 for controlling enabling/disabling of input buffer 24 and output buffer 26 in response to an externally applied write enable signals /WE and to the column strobe signal /CAS.
Transfer gate circuit 12 and data register circuit 14 are divided into blocks, respectively, corresponding to the blocks B#1 to B#4 of the memory cell array.
The CDRAM further includes a gate circuit 22 responsive to an externally applied cache hit/miss signal CH for transmitting a column address signal, which is, for example, lower 2 bits from column address buffer 4, as a block selecting signal to block decoder 18. Block decoder 18 is activated when cache hit/miss signal CH indicates a cache miss of "L", decodes the applied block address signal to select a corresponding memory cell block in the memory cell array 1, and drives block by block the transfer gate circuit 12 for transferring data between the selected memory cell array blocks and the data register corresponding to the selected memory cell array block.
FIG. 218 shows a structure of a main portion of the semiconductor memory device shown in FIG. 217. FIG. 218 shows a structure at the boundary region between two memory blocks B#1 and B#2.
Referring to FIG. 218, sense amplifier group 10 includes sense amplifiers SA#1 each provided corresponding to each bit line pair BL, /BL of memory block B#1 and sense amplifiers SA#2 each provided corresponding to each bit line pair BL, /BL of memory block B#2. Sense amplifiers SA#1 and SA#2 differentially amplify and latch the signals on the corresponding bit line pair BL, /BL when they are activated.
Transfer gate circuit 12 includes transfer gates DT#1 each provided for each bit line pair BL, /BL of memory block B#1 and transfer gates DT#2 each provided corresponding to each bit line pair BL /BL of memory block B#2. Transfer gates DT#1 provided for memory block B#1 are driven independent from transfer gates DT#2 provided for memory block B#2. More specifically, transfer gates DT#1 provided corresponding to memory block B#1 are driven by a block decoder circuit BD#1 provided for memory block B#1, while transfer gates DT#2 provided for memory block B#2 are driven by a block decoder circuit BD#2 provided for memory block B#2. Block decoder circuits BD#1 and BD#2 decode a block address transmitted at a time of cache miss from gate circuit 22 shown in FIG. 217, and drive a related transfer gate DT (#1 or #2) when the block address indicates a corresponding memory block.
A data register circuit 14 includes a register DR#1 provided corresponding to each bit line pair BL, /BL of memory block B#1 for latching data applied through transfer gate DT#1, and a register DR#2 receiving and storing data on the bit line pair BL, /BL of memory block B#2 through transfer gate DT#2. Data registers DR (#1 and #2) have a structure of an inverter latch circuit.
IO gate circuit 16 includes an IO gate TG provided for each of the bit line pairs BL, /BL of the memory blocks B#1 and B#2, responsive to a column selecting signal from column decoder 20 for connecting the corresponding bit line pair BL, /BL to an internal data transmitting line pair IO. IO gate TG connects the bit line pair BL, /BL of memory blocks B#1 and B#2 to internal data transmitting line pair IO through transfer gate circuit 12 and data register circuit 14. Therefore, when transfer gate circuit 12 is off (cut off state), IO gate TG connects the data register included in data register circuit 14 to internal data transmitting line pair IO. The operation of the semiconductor memory device shown in FIGS. 217 and 218 will be described with reference to the diagram of waveforms of FIG. 219.
The semiconductor memory device shown in FIG. 217 is used in a system including a CPU as an external processing device and a controller for controlling access to the semiconductor memory device in accordance with a request from the CPU. The controller includes a tag memory for storing tag addresses of data stored in data register circuit 14, a comparing circuit for determining coincidence/noncoincidence between a tag address stored in the tag memory and a portion of the address from the CPU (CPU address) corresponding to the tag address for generating a signal CH indicative of a cache hit/cache miss in accordance with the result of determination, and a control circuit (a state machine and an address multiplexer) for controlling address supply and access to the semiconductor memory device in accordance with the result of determination of the comparing circuit.
An address is supplied from the CPU in synchronization with the system clock. When the CPU address designates data stored in data register circuit 14, the externally provided controller sets the cache hit signal CH to "H" which corresponds to the active state. At this time, if the row address strobe signal /RAS is at active "L", the external controller toggles the column address strobe signal /CAS and extracts a column address CA from the CPU address and applies the same to the semiconductor memory device.
In the semiconductor memory device, the applied column address signal CA is taken by a column address buffer 4 which generates an internal column address signal and applies the same to column decoder 20. Since the cache hit signal CH is at "H", the output from gate circuit 22 is at "L", the block decoder 18 is at disabled state (or transmission of block address is inhibited), and block selecting operation is not carried out. In this case, column selecting operation is effected by column decoder 20, the corresponding data register is connected to the internal data line pair IO, and writing of data to or reading of data from the selected data register is carried out. Whether data is to be written or read depends on the write enable signal /WE.
While the data requested by the CPU is stored in data register circuit 14, the cache hit signal CH is at "H", and the corresponding data register of data register circuit 14 is selected in accordance with the column address signal CA.
When the CPU address does not designate the data stored in data register circuit 14 the cache hit signal CH is at the "L" state. At a time of a cache miss, the external controller once raises the signals /RAS and /CAS to "H", then lowers the row address strobe signal /RAS to "L", extracts row address signal RA from the CPU address and applies the same to the semiconductor memory device.
In the semiconductor memory device, row selecting operation in memory cell array 1 is carried out by row address buffer 2, row decoder 6 and word driver 8 in accordance with the applied row address signal RA, and the data of the memory cell connected to the selected row is detected, amplified and latched by sense amplifier group 10. In parallel with these operations, column address strobe signal /CAS is lowered to "L", and the column address signal CA is extracted from the CPU address and applied to the semiconductor memory device. In the semiconductor memory device, since the cache hit signal CH is at "L", block decoder 18 is activated and the block address signal of the applied column address signal is applied to the block decoder 18.
Block decoder 18 decodes the block address, and turns on all transfer gates provided corresponding to the memory block indicated by the block address. Consequently, in the selected memory block, data latched by the sense amplifier SA is transmitted to data register DR (#1 or #2). In parallel, column decoder 20 carries out column selecting operation, renders conductive the transfer gate TG included in IO gate circuit 16, and connects the data register DR to internal data transmission line pair IO. Thereafter, if cache hit is continued with the row kept at the selected state in the memory array 1, data register DR (#1 or #2) is selected by the column decoder 20 to be accessed.
By dividing the memory array into blocks and driving the data registers block by block as described above, the data register can be used as a cache. In this case, as shown in FIG. 220, data registers TR#1 to TR#4 provided corresponding to the memory array blocks B#1 to B#4, respectively, can store data of different rows, thereby improving cache hit rate, and in addition, the block size of the cache can be made the same as the number of columns included in the memory block, realizing appropriate size of the cache block.
In the semiconductor memory device such as described above, the DRAM array is used as a main memory, and the data register circuit can be used as a cache. Since data transfer between the main memory and the cache is effected on block by block basis, data can be transferred at high speed.
An application of the semiconductor memory device as described above, that is, a CDRAM to graphic data processing will be discussed.
FIG. 221 shows a structure of a general graphic data processing system. Referring to FIG. 221, the system includes a CPU 30 as a processing device, a CDRAM 32, a CRT 34 as a display, and a CRT controller 36 for controlling data transfer between CDRAM 32 and CRT 34. CPU 30, CDRAM 32 and CRT 34 are connected to an internal data bus 38. Data transfer is carried out through internal data bus 38.
CDRAM 32 stores both graphic data to be displayed and data utilized by CPU 30 which are not displayed. When the graphic data is to be displayed on CRT 34, data transfer between CDRAM 32 and CRT 34 is carried out under the control of CRT controller 36. Data read from CDRAM 32 is applied to CRT 34 through data bus 38, and is displayed on a display screen of a display, not shown.
When data stored in CDRAM 32 is to be processed, CPU 30 accesses CDRAM 32. At that time, CPU 30 can access CDRAM 32 at high speed in accordance with the result of determination of cache hit/cache miss, and therefore data can be processed at high speed. The data accessed by the CPU 30 should preferably be stored in the cache region of CDRAM 32. Assume that CRT controller 36 reads data in the memory array 1 of CDRAM 32 and transmits the same to CRT 34 for display.
In such a case, it is necessary in the CDRAM having the above described structure that row selecting operation and the column selecting operation are carried out under the control by the CRT controller 36. Data in the memory array 1 is read through data register circuit 14.
Therefore, in this case, data stored in the data register circuit to be used as a cache may be rewritten by data to be displayed on CRT 34. When image data generated from a video camera (not shown) or the like is to be written to CDRAM 32, cache data stored into data register circuit 14 is rewritten by the image data applied for writing to the main memory of the CDRAM 32, in this case also.
Therefore, in the above described CDRAM, writing and reading of data of the main memory cannot be carried out unless the data for the cache is changed. Accordingly, it is difficult to store both the graphic data and the data such as application programming which is not displayed, in the CDRAM.
In the conventional structure of the CDRAM, block division arrangement is employed when a DRAM main memory having large storage capacity is used. In that case, a block structure in which the memory array shown in FIG. 218 or 220 is used as one block is utilized. In the block division structure, only that block which includes a selected word line is activated, and other blocks are maintained at the inactive state. Accordingly, the number of available data registers is small correspondingly, which lowers the efficiency of use of the cache.
When there is only one row of data registers as in the structure of the CDRAM shown in FIG. 218, the mapping method which can be implemented is only the direct mapping method. In order to implement mapping of set associative method, it necessary to provide a plurality of rows of data registers. The direct mapping method and the set associative method cannot both be met. Only one of this mapping can be implemented.
In the CDRAM having the above described structure, access to 1 bit of data register can be carried out in parallel with data transfer from the DRAM array to the data register. However, unlike a common dual port video RAM, DRAM portion cannot be accessed in parallel with the access to the SRAM without affecting the access to the SRAM array by driving the DRAM portion and the SRAM portion independent from each other.