1. Field of the Invention
The present invention relates to semiconductor memory devices and, specifically, to a clock synchronized type semiconductor memory device which operates in synchronization with externally applied clock signals. More specifically, the present invention relates to a structure of a semiconductor memory device containing a cache, in which a dynamic random access memory (DRAM) having a large storage capacity serving as a main memory, and a static random access memory (SRAM) having small storage capacity serving as a cache memory are integrated on the same semiconductor chip.
2. Description of the Background Art
Historical Review on Memory Environment in a Conventional Data Processing System
(i) Usage of standard DRAM as a main memory
Operation speed of recent 16-bit or 32-bit microprocessing unit (MPU) has been so much increased as to have operation clock frequency as high as 25 MHz or higher. In a data processing system, a standard DRAM (Dynamic Random Access Memory) is often used as a main memory having large storage capacity, since cost per bit is low. Although access time in the standard DRAM has been reduced, the speed of operation of the MPU has been increased much faster than that of the standard DRAM. Consequently, in a data processing system using the standard DRAM as a main memory, increase of wait state is inevitable. The gap in speed of operation between MPU and the standard DRAM is inevitable because the standard DRAM has the following characteristics.
(1) A row address and a column address are time divisionally multiplexed and applied to the same address pin terminals. The row address is taken in the device at a falling edge of a row address strobe signal/RAS. The column address is taken in the device at a falling edge of a column address strobe signal/CAS. The row address strobe signal/RAS defines start of a memory cycle and activates row selecting circuitry. The column address strobe signal/CAS activates column selecting circuitry. Since a prescribed time period called "RAS-CAS delay time (tRCD)" is necessary from the time the signal/RAS is set to an active state to the time the signal/CAS is set to the active state, there is a limit in reducing the access time, namely, there is a limit derived from address multiplexing.
(2) When the row address strobe signal/RAS is once raised to set the DRAM to a standby state, the row address strobe signal/RAS cannot fall to "L" again until a time period called a RAS precharge time (tRP) has lapsed. The RAS precharge time is necessary for surely precharging various signal lines in the DRAM to predetermined potentials. Due to the RAS precharge time TRP, the cycle time of DRAM cannot be reduced. In addition, when the cycle time of the DRAM is reduced, the number of charging/discharging of signal lines in the DRAM is increased, which increases current consumption.
(3) The higher speed of operation of the DRAM can be realized by circuit technique such as improvement of layout, increase of degree of integration of circuits, development in process technique and by applicational improvement such as improvement in the method of driving. However, the speed of operation of the MPU is increased at much faster rate than DRAM. The speed of operation of semiconductor memories is hierarchical. For example, there are high speed bipolar RAMs using bipolar transistors such as ECLRAMs (Emitter Coupled RAM) and Static RAM, and relatively low speed DRAMs using MOS transistors (insulated gate type field effect transistors). It is very difficult to expect the operation speed (cycle time) as fast as several tens ns (nano seconds) in a standard DRAM formed of MOS transistors.
There have been various applicational improvements to stop the gap between speed of operations of the MPU and the standard DRAM. Such improvements mainly comprise the following two approaches.
(1) Use of high speed mode of the DRAM and interleave method
(2) External provision of a high speed cache memory (SRAM).
The first approach (1) includes a method of using a high speed mode such as a static column mode or a page mode, and a method of combining the high speed mode and the interleave method. In the static column mode, one word line (one row) is selected, and thereafter only the column address is changed successively, to successively access memory cells of this row. In the page mode, one word line is selected, and then column addresses are successively taken by toggling the signal/CAS to successively access memory cells connected to the selected one word line. In either of these modes, memory cells can be accessed without toggling the signal/RAS, enabling higher speed accessing than the normal access using the signals/RAS and/CAS.
In the interleave method, a plurality of memories are provided in parallel to a data bus, and by alternately or successively accessing the plurality of memories, the access time is reduced in effect. The use of high speed mode of the DRAM and combination of the high speed mode and the interleave method have been known as a method of using the standard DRAM as a high speed DRAM in a simple and relatively effective manner.
The second approach (2) has been widely used in a main frame art. A high speed cache memory is expensive. However, in the field of personal computers in which high performance as well as low cost are desired, this approach is employed in some parts of the field with a sacrifice of cost. There are three possible ways to provide the high speed cache memory. Namely,
(a) the high speed cache memory is contained in the MPU itself;
(b) the high speed cache memory is provided outside the MPU; and
(c) the high speed cache memory is not separately provided but the high speed mode contained in the standard DRAM is used as a cache (the high speed mode is used as a pseudo cache memory). When a cache hit occurs, the standard DRAM is accessed in the high speed mode, and at the time of a cache miss, the standard DRAM is accessed in the normal mode.
The above mentioned three ways (a) to (c) have been employed in the data processing systems in some way or other. In most MPU systems, the memories are organized in a bank structure and interleaving is carried out on bank by bank basis in order to conceal the RAS precharge time (TRP) which is inevitable in the DRAM, in view of cost. By this method, the cycle time of the DRAM can be substantially one half that of the specification value.
The method of interleave is effective only when memories are sequentially accessed. When the same memory bank is to be continuously accessed, it is ineffective. Further, substantial improvement of the access time of the DRAM itself cannot be realized. The minimum unit of the memory must be at least 2 banks.
When the high speed mode such as the page mode or the static column mode is used, the access time can be reduced effectively only when the MPU successively accesses a certain page (data of a designated one row). This method is effective to some extent when the number of banks is comparatively large, for example 2 to 4, since different rows can be accessed in different banks. When the data of the memory requested by the MPU does not exist in the given page, it is called a "miss hit" (cache miss). Normally, a group of data are stored in adjacent addresses or sequential addresses. In the high speed mode, a row address, which is one half of the addresses, has been already designated, and therefore possibility of "miss hit" is high.
When the number of banks becomes as large as 30 to 40, data of different pages can be stored in different banks, and therefore the "miss hit" rate is remarkably reduced. However, it is not practical to provide 30 to 40 banks in a data processing system. In addition, if a "miss hit" occurs, the signal/RAS is raised and the DRAM must be returned to the precharge cycle in order to reselect the row address, which sacrifices the characteristic of the bank structure.
In the above described second method (2), a high speed cache memory is provided between the MPU and the standard DRAM. In this case, the standard DRAM may have relatively low speed of operation. Standard DRAMs having storage capacities as large as 4M bits or 16M bits have come to be used. In a small system such as a personal computer, the main memory thereof can be formed by one or several chips of standard DRAMs. External provision of the high speed cache memory is not so effective in such a small system in which the main memory can be formed of one standard DRAM. If the standard DRAM is used as the main memory, the data transfer speed between the high speed cache memory and the main memory is limited by the number of data input/output terminals of the standard DRAM, which constitutes a bottleneck in increasing the speed of the system.
When the high speed mode is used as a pseudo cache memory, the speed of operation thereof is slower than the high speed cache memory, and it is difficult to realize the desired system performance.
(ii) Consideration on a conventional cache containing DRAM
Provision of the high speed cache memory (SRAM) in the DRAM is proposed as a method of forming a relatively inexpensive and small system, which can solve the problem of sacrifice of system performance when the interleave method or the high speed operation mode is used. More specifically, a single chip memory having a hierarchical structure of a DRAM serving as a main memory and a SRAM serving as a cache memory has been conceived. The one-chip memory having such a hierarchical structure is called a cache DRAM (CDRAM). The CDRAM will be described with reference to FIGS. 1 through 4.
FIG. 1 shows a structure of a main portion of a conventional standard 1 megabit DRAM. As shown in FIG. 1, the DRAM comprises a memory cell array 500 including a plurality of memory cells MC arranged in a matrix of rows and columns. A row of memory cells are connected to one word line WL. A column of memory cells MC are connected to one column line CL. Normally, the column line CL is formed by a pair of bit lines. A memory cell MC is positioned at a crossing of one of the pair of bit lines and one word line WL. In a 1M DRAM, the memory cells MC are arranged in a matrix of 1024 rows.times.1024 columns. Namely, the memory cell array 500 includes 1024 word lines WLs and 1024 column lines CLs (1024 pairs of bit lines).
The DRAM further comprises a row decoder 502 which decodes an externally applied row address (not shown) for selecting a corresponding row of the memory cell array 500; a sense amplifier which detects and amplifies data of the memory cell connected to the word line selected by the row decoder 502; and a column decoder which decodes an externally applied column address (not shown) for selecting a corresponding column of the memory cell array 502. In FIG. 1, the sense amplifier and the column decoder are denoted by one block 504. If the DRAM has a x 1 bit structure in which input/output of data is effected bit by bit, one column line CL (a bit line pair) is selected by the column decoder.
If the DRAM has a x 4 bit structure in which input/output of data is effected 4 bits by 4 bits, 4 column lines CL are selected by the column decoder. One sense amplifier is provided for each column line (bit line pair) CL in the block 504.
In memory access for writing data to or reading data from the memory cell MC in the DRAM, the following operation is carried out. First, a row address is applied to the row decoder 502. The row decoder 502 decodes the row address and raises the potential of one word line WL in the memory cell array 500 to "H". Data of the 1024 bits of memory cells MC connected to the selected word line WL are transmitted to corresponding column lines CL. The data on the column lines CL are amplified by sense amplifiers included in the block 504. Selection of a memory cell to which the data is written or from which the data is read out of the memory cells connected to the selected word line WL is carried out by a column selection signal from the column decoder included in the block 504. The column decoder decodes column address signals (more accurately, internal column address signals), and generates a column selecting signal for selecting the corresponding column in the memory cell array 500.
In the above described high speed mode, column addresses are successively applied to the column decoder included in the block 504. In the static column mode operation, column addresses applied at predetermined time intervals are decoded as new column addresses by the column decoder, and the corresponding memory cell out of the memory cells connected to the selected word line WL is selected by the column line CL. In the page mode, new column address is applied at every toggling of the signal /CAS, and the column decoder decodes the column address to select the corresponding column line. In this manner, one row of memory cells MC connected to the selected word line WL can be accessed at high speed by setting one word line WL at a selected state and by changing the column addresses only.
FIG. 2 shows a general structure of a conventional 1M bit CDRAM. Referring to FIG. 2, the conventional CDRAM comprises, in addition to the components of the standard DRAM shown in FIG. 1, SRAM 506 and a transfer gate 508 for transferring data between one row of the memory cell array 500 of the DRAM and the SRAM 506. The SRAM includes a cache register provided corresponding to each column line CL of the memory cell array 500 so as to enable simultaneous storage of data of one row of the DRAM memory cell array 500. Therefore, 1024 cache registers are provided. The cache register is formed by a static memory cell (SRAM cell).
In the structure of the CDRAM shown in FIG. 2, when a signal representing a cache hit is externally applied, the SRAM 506 is accessed, enabling access to the memory at high speed. At the time of a cache miss (miss hit), the DRAM portion is accessed.
A CDRAM as described above having a DRAM of a large storage capacity and a high speed SRAM integrated on the same chip is disclosed in, for example, Japanese Patent Laying-Open Nos. 60-7690 and 62-38590.
In the above described conventional CDRAM structure, column lines (bit line pairs) CL of the DRAM memory cell array 500 and column lines (bit line pairs) of the SRAM (cache memory) 506 are connected in one to one correspondence through a transfer gate 508. More specifically, in the above described conventional CDRAM structure, data of the memory cells connected to one word line WL in the DRAM memory cell array 500 and the data of the same number of SRAM cells as memory cells of one row of the memory cell array 500 are transferred bi-directionally and simultaneously, through the transfer gate 508. In this structure, the SRAM 506 is used as a cache memory and the DRAM is used as a main memory.
The so called block size of the cache is considered to be the number of bits (memory cells) the contents of which are rewritten in one data transfer in SRAM 506. Therefore, the block size is the same as the number of memory cells which are physically coupled to one word line WL of DRAM memory cell array 500. As shown in FIGS. 1 and 2, when 1024 memory cells are physically connected to one word line WL, the block size is 1024.
Generally, when the block size becomes larger, the hit rate is increased. However, if the cache memory has the same size, the number of sets is reduced in inverse proportion to the block size, and therefore the hit rate is decreased. For example, when the cache size is 4K bits and the block size 1024, the number of sets is 4. However, if the block size is 32, the number of sets is 128. Therefore, in the conventional CDRAM structure, the block size is made too large, and the cache hit rate cannot be very much improved.
A structure enabling reduction in block size is disclosed in, for example, Japanese Patent Laying-Open No. 1-146187. In this prior art, column lines (bit line pairs) of the DRAM array and the SRAM array are arranged in one to one correspondence, but they are divided into a plurality of blocks in the column direction. Selection of the block is carried out by a block decoder. At the time of a cache miss (miss hit), one block is selected by the block decoder. Data are transferred only between the selected DRAM block and the associated SRAM block. By this structure, the block size of the cache memory can be reduced to an appropriate size. However, there remains the following problem unsolved.
FIG. 3 shows a standard array structure of a 1M bit DRAM array. In FIG. 3, the DRAM array is divided into 8 memory blocks DMB1 to DMB8. A row decoder 502 is commonly provided for the memory blocks DMB1 to DMB8 on one side in the longitudinal direction of the memory array. For each of the memory blocks DMB1 to DMB8, (sense amplifier+column decoder) blocks 504-1 to 504-8 are provided.
Each of the memory blocks DMB1 to DMB8 has the capacity of 128K bits. In FIG. 3, one memory block DMB is shown to have 128 rows and 1024 columns, as an example. One column line CL includes a pair of bit lines BL, /BL.
As shown in FIG. 3, when the DRAM memory cell array is divided into a plurality of blocks, one bit line BL (and/BL) becomes shorter. In data reading, charges stored in a capacitor (memory cell capacitor) in the memory cell are transmitted to a corresponding bit line BL (or/BL). At this time the amount of potential change generated on the bit line BL (or/BL) is proportional to the ratio Cs/Cb of the capacitance Cs of the memory cell capacitor to the capacitance Cb of the bit line BL (or/BL). If the bit line BL (or/BL) is made shorter, the bit line capacitance Cb can be reduced. Therefore, the amount of potential change generated on the bit line can be increased.
In operation, sensing operation in the memory block (memory block DMB2 in FIG. 3) including the word line WL selected by the row decoder 502 is carried out only, and other blocks are kept in a standby state. Consequently, power consumption associated with charging/discharging of the bit lines during sensing operation can be reduced.
When the above described partial activation type CDRAM is applied to the DRAM shown in FIG. 3, a SRAM register and a block decoder must be provided for each of the memory blocks DMB1 to DMB8, which significantly increases the chip area.
In this structure, only SRAM cache registers corresponding to the selected block operate, and therefore, efficiency in using the SRAM cache registers is low.
Further, the bit lines of the DRAM array and of the SRAM array are in one to one correspondence, as described above. When direct mapping method is employed as the method of mapping memories between the main memory and the cache memory, then the SRAM 506 is formed by 1024 cache registers arranged in one row, as shown in FIG. 2. In this case, the capacity of the SRAM cache is 1K bits.
When 4 way set associative method is employed as the mapping method, the SRAM array 506 includes 4 rows of cache registers 506a to 506d as shown in FIG. 4. One of the 4 rows of cache registers 506a to 506d is selected by the selector 510 in accordance with a way address. In this case, the capacity of the SRAM cache is 4K bits.
As described above, the method of memory cell mapping between the DRAM array and the cache memory is determined dependent on the internal structure on the chip. When the mapping method is to be changed, the cache size also must be changed.
In both of the CDRAM structures described above, the bit lines of the DRAM array and the SRAM array are in one to one correspondence. Therefore, the column address of the DRAM array is inevitably the same as the column address of the SRAM array. Therefore, full associative method in which memory cells of the DRAM array are mapped to an arbitrary position of the SRAM array is impossible in principle.
Another structure of a semiconductor memory device in which the DRAM and the SRAM are integrated on the same chip is disclosed in Japanese Patent Laying-Open No. 2-87392. In this prior art, the DRAM array and the SRAM array are connected through an internal common data bus. The internal common data bus is connected to an input/output buffer for inputting/outputting data to and from the outside of the device. Selected memory cells of the DRAM array and the SRAM array can be designated by separate addresses.
However, in this structure of the prior art, data transfer between the DRAM array and the SRAM array is carried out by an internal common data bus, and therefore the number of bits which can be transferred at one time is limited by the number of internal data bus lines, which prevents high speed rewriting of the contents of the cache memory. Therefore, as in the above described structure in which the SRAM cache is provided outside the standard DRAM, the speed of data transfer between the DRAM array and the SRAM array becomes a bottleneck, preventing provision of a high speed cache memory system.
(iii) Consideration on a general clock synchronized type semiconductor device for the problems of which the present invention includes the solution.
A semiconductor memory device of an application specific IC (ASIC) or for pipe line usage operates in synchronization with an external clock signal such as a system clock. Operation mode of a semiconductor memory device is determined dependent on states of external control signals at rising or falling edge of the external clock signal. The external clock signal is applied to the semiconductor memory device no matter whether the semiconductor memory device is being accessed or not. In this structure, in response to the external clock signal, input buffers or the like receiving the external control signals, address signals and data operate. In view of power consumption, it is preferred not to apply the external clock signal to the semiconductor memory device when the semiconductor memory device is not accessed, or to elongate period of the external clock signal.
Generally, a row address signal and the column address signal are applied multiplexed time divisionally to the DRAM. The row address signal and the column address signal are taken in the device in synchronization with the external clock signal. Therefore, when the conventional DRAM is operated in synchronization with the external clock signal, it takes long time to take the row address signal and the column address signal. Therefore, if low power consumption is given priority, the DRAM can not be operated at high speed.
If the conventional semiconductor memory device is operated in synchronization with the external clock signal, the speed of operation is determined solely by the external clock signal. If the semiconductor memory device is to be used where low power consumption is given priority over the high speed operation with the speed defined by the external clock signal, the conventional clock synchronized type semiconductor memory device can not be used for such application.
In a clock synchronized type semiconductor memory device, control signals and address signals are taken inside in synchronization with the clock signal. The control signals and address signals are taken inside by buffer circuits. Each buffer circuit is activated in synchronization with the clock signal and generates an internal signal corresponding to the applied external signal. In a standby state or the like, valid control signals and valid address signals are not applied. However, external clock signals are continuously applied, causing unnecessary operations of the buffer circuits. This prevents reduction in power consumption during standby state. If the cycle period of the external clock signal becomes shorter, the number of operations of the buffer circuits is increased, causing increase of power consumption during standby period. This is a serious problem in realizing low power consumption.
(iv) Consideration on the problems in refreshing operation in a conventional RAM
If the semiconductor memory device includes dynamic memory cells (DRAM cells), the DRAM cells must be periodically refreshed. The refresh mode of a DRAM generally includes an auto refresh mode and a self refresh mode, as shown in FIGS. 5 and 6.
FIG. 5 shows waveforms in the auto refresh operation. In the auto refresh mode, a chip select signal *CE is set to "H" and an external refresh designating signal *REF is set to "L". In response to a fall of the external refresh designating signal *REF, an internal control signal int. *RAS for driving row selecting circuitry falls to "L". In response to the internal control signal int. *RAS, a word line is selected in accordance with a refresh address generated from a built-in address counter, and memory cells connected to the selected word line are refreshed. In the auto refresh mode, the timing of refreshing the semiconductor memory device is determined by the externally applied refresh designating signal *REF. Therefore, whether or not refreshing is being carried out in the semiconductor memory device can be known outside the memory device.
FIG. 6 shows waveforms in the self refresh operation. In the self refresh mode, the chip select signal *CE is set to "H" and the external refresh designating signal *REF is set to "L". When the external refresh designating signal *REF falls to "L", the external control signal int. *RAS is generated, and a word line is selected in accordance with the refresh address from the built-in address counter. Thereafter, sensing operation and rewriting of the memory cells connected to the selected word line are carried out, and the memory cells connected to the word line WL are refreshed.
The first cycle of self refreshing is the same as that of auto refreshing. When the chip select signal *CE is at "H" and the refresh designating signal *REF is kept at "L" for a predetermined time period TF or longer, a refresh request signal is generated from a built-in timer. In response, the internal control signal int. *RAS is generated, the word line is selected and the memory cells connected to the selected word line are refreshed. This operation is repeated while the refresh designating signal *REF is at "L". In the refreshing operation in the self refresh mode, the timings of refreshing are determined by a timer contained in the semiconductor memory device. Therefore, timings of refreshing can not be known from the outside. Normally, data can not be externally accessed in the self refresh mode. Therefore, in the normal mode, self refreshing is not carried out. The self refresh mode is generally carried out at a standby for retaining the data.
Different semiconductor chips have different upper limits of refresh period necessary for retaining data (see NIKKEI ELECTRONICS, Apr. 6, 1987, p. 170, for example). Generally, a guaranteed value for retaining data is measured by testing the semiconductor memory device, and period of a timer defining the self refresh cycle is programmed in accordance with the guaranteed value, for carrying out self refreshing. When auto refresh mode and self refresh mode are selectively used, the guaranteed value for retaining data must be measured in order to determine the self refresh cycle. As shown in FIG. 6, in the self refresh mode, an operation similar to that in the auto refreshing is carried out in response to the external refresh designating signal *REF, and then refreshing operation in accordance with the timer is carried out. Therefore, in an accurate sense, the self refresh cycle means a cycle carried out after a lapse of a prescribed time period TF successive to the auto refreshing. In the self refresh cycle, the refresh timing is determined by the contained timer, as described above, and the timings of refreshing can not be known from the outside. Therefore, the self refresh cycle can not be used as a method of hidden refreshing, for example, in a normal operation mode.
(v) Consideration on Array Arrangement in CDRAM and data transfer between CDRAM and MPU (burst mode)
In a semiconductor memory device containing a DRAM array and a SRAM array, it is preferred to transfer data at high speed from the DRAM array to the SRAM array, so as to enable high speed operation. When data are transferred from the DRAM array to the SRAM array, a row (word line) is selected, data of the memory cells connected to the selected word line are detected and amplified, and then a column is selected in the DRAM array.
Generally, a row address signal and a column address signal are applied multiplexed to the DRAM. Therefore, increase of the speed of data transfer from the DRAM array to the SRAM array is limited by this address multiplexing. In this case, it is possible to apply the row address and the column address simply in accordance with a non-multiplex method to the DRAM. However, in that case, the number of terminals for inputting DRAM addresses are increased significantly. When the number of terminals is increased, the chip size and the package size are increased, which is not preferable.
In addition, data transfer from the DRAM array to the SRAM array must be done after detection and amplification of the memory cell data by the sense amplifiers. Therefore, data transfer from the DRAM array to the SRAM array can not be carried out at high speed.
Further, some external operational processing units such as a CPU (Central Processing Unit) include a data transfer mode called a burst mode for carrying out data transfer at high speed. In the burst mode, a group of data blocks are transferred successively. A block of data is stored at successively adjacent address positions. Since the burst mode is a high speed data transfer mode, the data blocks are stored in the cache memory in the semiconductor memory device containing a cache. A semiconductor memory device containing a cache which can be easily connected to an operational processing unit having burst mode function has not yet been provided.
In order to implement a CDRAM, DRAM array and SRAM array are integrated on the same semiconductor chip. The semiconductor chip is housed in a package. The layout of DRAM array and SRAM array as well as the geometrical figures thereof on the chip are determined by the geometrical figure and the physical dimensions of the housing package.
DRAM array and its associated circuitry occupy a major area of a chip in CDRAM because DRAM is employed as a large storage capacity memory. Thus, the size and figure of DRAM array are substantially determined by the size and shape of the housing package.
In order to efficiently use the chip area, SRAM array should be arranged or laid out on the chip efficiently. However, no consideration has made on the configuration of SRAM array for implementing efficient chip area utilization and for housing CDRAM in a package of an arbitrary shape and size.