1. Field of the Invention
The present invention relates generally to a semiconductor memory device, and more particularly, to a structure of a semiconductor memory device containing a cache memory having an improved cache hit rate in a simple configuration and an operating method therefor.
2. Description of the Prior Art
A computer system generally comprises a central processing unit (CPU) for executing applied instructions and a main memory for storing data, programs or the like which the CPU requires. From the point of view of improvement in system performance, it is desirable to operate the CPU at high speed and with no wait. Thus, the access time to the main memory must be made as short as possible, to be a value corresponding to the operating speed of the CPU. However, when as in recent years, a clock cycle of the CPU becomes short, for example, from 16MHz to 20MHz, it is necessarily requested to shorten the access time to the main memory. However, this request has exceeded the performance of a DRAM (dynamic random access memory) used as the main memory. In order to cope with this, a high-speed memory is required, which is expensive. Thus, the high-speed memory is not desirable in terms of cost performance. One method for solving this problem uses the approach of structuring a memory in a hierarchy, which is referred to as a cache memory system. In this system, a low-speed and large capacity and thus low-cost DRAM is employed as the main memory, and a small capacity but high-speed buffer memory is provided between the CPU and the low-speed main memory. Frequently used data in the main memory is stored in the high-speed buffer memory in response to a request from the CPU. In response to access from the CPU, requested data is read out/written from/to the high-speed buffer memory in place of the main memory. This high-speed buffer memory is referred to as a cache memory. The state in which data in an address which the CPU attempts to access exist in the cache memory is referred to as "hit". In this case, the CPU accesses the high-speed cache memory. On the other hand, the state in which data in an address which the CPU attempts to access do not exist in the cache memory is referred to as "miss hit". In this case, the CPU makes access to the low-speed main memory, as well as transfers a block to which the requested data belongs from the main memory to the cache memory. The cache memory stores the block of these transferred data, to prepare for the next access from the CPU.
As described above, the cache memory does not store fixed data. A data region of the main memory stored in the cache memory changes in response to a request from the CPU. However, a region in the main memory to be accessed by the CPU has locality in a data processing. Thus, there is a high possibility that data extracted from the main memory in response to the request from the CPU and stored in the cache memory is accessed for a certain time period. Thus, when data in the main memory is once stored in the cache memory, the effect of the high-speed memory is fully developed, so that no wait in access of the CPU to the memory is achieved. More specifically, a processing operation of the CPU is not delayed depending on the access time to the memory.
As described above, the high-speed cache memory is provided as a buffer between the low-speed and large capacity main memory and the high-speed CPU, so that system performance and cost performance are improved. However, the above described cache memory system, the capacity of which is small, requires an expensive high-speed memory. Therefore, in a small-sized system attaching importance to the cost, the cache memory system could not be applied.
In the conventional small-sized system, a simple cache system has been achieved utilizing a fast access mode of a general-purpose DRAM, i.e., a page mode and a static column mode.
Referring now to FIG. 1, description is made of a structure of the DRAM with a fast access mode. The DRAM includes a memory cell array 5 having a plurality of memory devices (memory cells) MC for storing information arranged in a matrix of rows and columns. The rows in the memory cell array 5 is defined by a word line WL and the columns in the memory cell array 5 is defined by a bit line BL. In FIG. 1, a single word line WL, a single bit line BL and a memory cell MC located at an intersection thereof are typically shown. In order to detect, amplify and latch a signal voltage which appeared on the bit line at the time of selecting the word line, there is provided a sense amplifier 6 corresponding to each column in the memory cell array 5.
In order to select memory cells on one row in the memory cell array 5, there are provided a row address buffer 1, a row decoder 3 and a word driver 4. The row address buffer 1 accepts an externally applied row address in response to a control signal RAS, to generate an internal row address RA. The row decoder 3 decodes the internal row address RA from the row address buffer 1, to designate one word line. The word driver 4 activates the word line designated by a row address decode signal from the row decoder 3 in response to the decode signal.
In order to select memory cells on one column in the memory cell array 5, there are provided a column address buffer 2, a column decoder 8 and an I/O switch 7. The column address buffer 2 accepts an external column address in response to a control signal CAS, to generate an internal column address CA. The column decoder 8 decodes the internal column address CA from the column address buffer 2, to generate a signal for selecting a column designated by the column address. The I/O switch 7 connects a column (a bit line) designated by a column address decode signal from the column decoder 8 to an I/O bus 13 in response to the decode signal.
In order to input/output data, there are provided an input buffer 14 receiving input data D.sub.IN externally applied for generating internal data to apply the same onto the I/O bus 13, and an output buffer 15 receiving through the I/O bus 13 information in the memory cell selected by the row and column addresses for generating external data D.sub.OUT.
In order to control a data input/output operation of the memory, there is provided an R/W control 16 responsive to a write enable signal WE and the signal CAS for controlling operations of the data input buffer 14 and the data output buffer 15.
The external address is applied through the same pins by multiplexing the row address and the column address. The control signal RAS provides operation timing of a circuit associated with the row address. In addition, the signal RAS is activated, so that the memory cycle is started. The signal CAS provides operation timing of a circuit associated with column selection. Furthermore; the signal CAS provides timing of reading out and writing data, depending on the operation mode. Referring now to waveform diagrams of FIGS. 2 to 4, an operation of the DRAM will be described.
Referring now to FIG. 2, description is made on a normal cycle of the DRAM. The signal RAS falls to a low level, so that the memory cycle is started. The externally applied row address is accepted in a chip of the DRAM at the falling edge of the signal RAS. The internal row address is generated from the row address buffer 1, to be applied to the row decoder 3. The row address is decoded by the row decoder 3, so that a single word line is selected through the word driver 4. Consequently, information in memory cells on one row connected to the selected word line are transferred onto each bit line (column). The information on each bit line is detected, amplified and latched by the sense amplifier 6. On the other hand, when the signal CAS falls, the externally applied column address is accepted in the column address buffer 2, so that the internal column address CA is generated. The column decoder 8 decodes the internal column address CA, to select a column designated by the column address. The I/O switch 7 connects the column (bit line) selected by the decode signal from the column decoder 7 to the I/O bus 13. Consequently, the information in the selected memory cell detected and latched by the sense amplifier 6 is outputted through the output buffer 15. More specifically, in the normal cycle, the row address is accepted in the chip at the falling edge of the signal RAS and then, the column address is accepted in the chip at the falling edge of the signal CAS. Thereafter, data stored in the memory cell selected by the row address RA and the column address CA is outputted. Thus, the access time (from the time when the signal RAS falls to the time when valid data is outputted) requires an RAS access time T.sub.RAC shown in FIG. 2. A cycle time Tc is the sum of the time when the DRAM is active (the signal RAS is at the low level) and an RAS precharge time (the time when the signal RAS is at a high level, during which the device is in a standby state) T.sub.RP. As a standard value, Tc equals approximately 200 ns in the DRAM with T.sub.RAC of 100 ns.
Referring now to FIG. 3, description is made on a page mode operation. First, in the same manner as in the normal operation cycle, the row address and the column address are applied, so that information in the selected memory cell is read out through the output buffer 15. Then, the signal CAS is made high while the signal RAS is held at the low level. Consequently, circuits associated with column selection such as the column address buffer 2 and the column decoder 8 are reset. On the other hand, since the signal RAS is at the low level, the sense amplifier 6 latches information in memory cells on one row selected by the row address. Then, when the column address is applied and the signal CAS is made low, a column (a bit line) corresponding to the column address newly applied is selected, so that information on the column selected by the column decoder 8 and the I/O switch 7 is read out through the I/O bus 13 and the output buffer 15. The operation for accepting a new column address every time the signal CAS is toggled is allowed to be repeated many times at the time during the time period when the signal RAS is allowed to be held at the low level. More specifically, the page mode operation is an operation for accessing memory cells connected to the same row by changing only the column address. Since only the column address is changed, the row address need not be accepted for each access, so that access is achieved at higher speed than that in the normal operation cycle.
Referring now to FIG. 4, description is made on the static column mode. In the static column mode, the first access is the same as that in the normal operation. The row address and the column address are accepted in the chip in response to the signals RAS and CAS, respectively, so that the information in the selected memory cell is read out. After a predetermined time period elapsed from the time when the valid data is read out, the column address is changed while the signals RAS and CAS are held at the low level. Consequently, information in a memory cell corresponding to a new column address out of the memory cells on the same row is read out. Even in this operation mode, the signal RAS is held at the low level, so that information in the memory cells on one row designated by the row address first applied are latched by the sense amplifier. Thus, similarly to the page mode, the static column mode is a mode for accessing memory cells connected to the same row by changing only the column address. However, as in the static RAM, the signal CAS is held at the low level (which corresponds to the signal CS in the static RAM), access is made only by changing the column address. Thus, the signal CAS need not be toggled, so that the access can be made at higher speed than that in the page mode.
The access time in the page mode (from the time when the signal CAS falls to the time when valid data is outputted) T.sub.CAC and the access time in the static column mode (from the time when the column address is changed from the time when the valid data is outputted) T.sub.AA become values of approximately one-half of the RAS access time T.sub.RAC in the normal operation, i.e., T.sub.AA .apprxeq.50 ns for a device with T.sub.RAC =100 ns. In addition, the cycle time is shortened. The cycle time in the page mode becomes the same value as that in the static column mode, i.e. approximately 50 ns which may vary depending on the value of the CAS precharge time T.sub.CP.
A static column mode operation and a cache system employing a DRAM operable in a static column mode are described in an article by J. G. Goodman et al., entitled "The Use of Static Column RAM as a Memory Hierarchy", IEEE 11th Annual Symposium on Computer Architecture, 1984, pp. 167-174.
A page mode operation and a ripple mode/static column mode operation and a cache system employing a DRAM which can perform the operations are proposed in Application Note on 256K CMOS DRAM, Intel Corp. pp. 1-276 to 1-279,
Referring now to FIG. 5, description is made on structure and an operation of a simple cache memory system utilizing the above described fast serial access mode such as the page mode or the static column mode.
Referring to FIG. 5, the main memory system comprises 8 DRAMs 22-1 to 22-8 which can perform a fast serial access operation. Each of the DRAMs 22-1 to 22-8 has a 1M.times.1 b structure. More specifically, each of the DRAMs 22-1 to 22-8 has its capacity of 1 mega bits (2.sup.20 bits), to or from which data is inputted or outputted bit by bit. Thus, the main memory system has a 1M byte structure. The same address is multiplexed, to be applied to each of the DRAMs 22-1 to 22-8. Thus, a 10-bit address is applied to each of the DRAMs.
In order to control access to the main memory, there are provided an address generator 17, a latch (tag) 18, a comparator 19, a state machine 20 and an address multiplexer 21.
The address generator 17 generates an address of data requested by the CPU (not shown) in response to address information from the CPU. Where the main memory system has a 1M byte structure, 20-bits of the addresses (a 10-bit row address and a 10-bit column address) are simultaneously transferred onto a 20-bit address bus 40.
The latch (tag) 18 stores a row address selected in the previous cycle in response to the address from the address generator 17. The row address stored in the latch (tag) 18 is not rewritten if there is a cache hit, while being rewritten with a row address newly generated from the address generator 17 if there is a cache miss.
The comparator 19 compares the row address from the address generator 17 with the row address stored in the latch (tag) 18, to generate a signal CH (cache hit) indicating the result of comparison. The signal CH is applied to the latch (tag) 18. Consequently, updating of stored contents in the latch (tag) 18 is controlled. The signal CH is also applied to the state machine 20.
The state machine 20 generates control signals RAS, CAS and WE in response to the signal CH, to apply the same to each of the DRAMs 22-1 to 22-8. The signal WE is a signal for designating input/output of data to/from the main memory system. The signal WE designates data reading when it is at the high level while designating data writing when it is at the low level. The signal WE is applied to data input buffer and data output buffer in the DRAM. Data is written in response to timing of the later fall of the falls of the signals CAS and WE. When the signal CH from the comparator 19 indicates noncoincidence (cache miss), the state machine 20 once makes the signals RAS and CAS high and then, sequentially lowers the signals RAS and CAS to cause the DRAM to perform the normal operation, as well as applies a signal WAIT to the CPU to bring the CPU into waiting state. When the signal CH indicates coincidence (cache hit), the state machine 20 toggles the signal CAS with the signal RAS being at the low level, to cause the DRAM to perform the page mode operation.
The address multiplexer 21 multiplexes the address from the address generator 17 and transfers the same onto a 10-bit address bus 41, to apply the same to each of the DRAMs 22-1 to 22-8 under the control of the state machine 20. When the signal CH indicates noncoincidence, the address multiplexer 21 multiplexes the 20-bit address applied from the address generator 17, to seqentially generate the 10-bit row address and the 10-bit column address under the control of the state machine 20. When the signal CH indicates coincidence, the address multiplexer 21 generates only the 10-bit column address out of the applied addresses under the control of the state machine 21. Referring now to a waveform diagram of FIG. 6, description is made on the operation of the cache memory system shown in FIG. 5. A system clock shown in FIG. 6 is a clock for providing operation timing of the memory system and the CPU, one clock defining one machine cycle.
As a certain program proceeds, the CPU generates address information of data requested by the CPU. In response thereto, the address generator 17 generates an address indicating the location where the data requested by the CPU is stored in response to the fall of the system clock, to transfer the same onto the 20-bit address bus 40. The comparator 19 compares a 10-bit row address out of generated addresses with the row address stored in the latch (tag) 18. When both coincide with each other, it is indicated that the same row as a row to which the memory cell accessed in the previous cycle belongs is accessed, the comparator 19 generates the signal CH indicating cache hit. The state machine 20 toggles the signal CAS with the signal RAS being at the low level (the signal RAS has been made low until then and each DRAM is in an enable state) in response to the signal CH from the comparator 19. On the other hand, the address multiplexer 21 transfers the 10-bit column address onto the 10-bit address bus 41 under the control of the state machine 20 at the time of generating the signal CH. Consequently, each of the DRAMs 22-1 to 22-8 performs the page mode operation, and outputs data at high speed in an access time of T.sub.CAC, to apply the same to the CPU (input/output of data is designated by the signal WE). This designation is provided from the CPU and generated from the state machine 20).
On the other hand, when the row address stored in the latch (tag) 18 and the row address generated by the address generator 17 do not coincide with each other, the comparator 19 does not generate the signal CH (or the signal CH is at the low level). In this case, since memory cells on a different row from the row accessed in the previous cycle are accessed, a row address is newly applied to each of the DRAMs 22-1 to 22-8. The state machine 20 once changes signals RAS and CAS to the inactive state of the high level in response to the fact that the signal CH is not generated, to cause each of the DRAMs 22-1 to 22-8 to perform the normal operation. The address multiplexer 21 multiplexes the 20-bit address from the address generator 17 to sequentially transfer the 10-bit row address and the 10-bit column address onto the address bus 41 under the control of the state machine 20. Each of the DRAMs 22-1 to 22-8 accepts the row address at the falling edge of the signal RAS to select one word line while accepting the column address at the falling edge of the signal CAS to select one column, to output information in the selected memory cell.
Thus, on cache miss, the normal operation cycle beginning with the RAS precharge is started. The minimum value of the RAS precharge time is defined, so that the next operation cycle can not be started before a lapse of the RAS precharge time. In addition, the access time elapsed until determined data is outputted becomes T.sub.RAC in the operation at low speed. Since this time is longer than the operation cycle of the CPU, the state machine 20 applies to the CPU the signal WAIT for bringing the CPU into the WAIT state until the determined data is outputted. In addition, on cache miss, the latch (tag) 18 stores and holds a new row address on the address bus 40. Determination whether or not the latch (tag) 18 changes the stored contents is controlled in response to the signal CH.
In the above described structure, the latch (tag) 18 stores the row address, and it is determined whether or not the stored row address and a row address to be newly accessed coincide with each other. In other words, in the conventional simple cache memory system, data corresponding to one row in the DRAM (1024 bits in the case of a 1 Mb device) constitutes one block, and it is determined whether cache hit/miss occurs in this data block.
However, the probability that access from the CPU is continuously made to all data in one block (1024 bit/DRAM corresponding to one row in the above described conventional example) is not high, so that the block size (1024 bit/DRAM) is unnecessarily large.
Additionally, in the above described prior art structure utilizing the page mode or the static column mode, the number of blocks (entries) held in the latch (tag) 18 is 1, which can not be made larger, so that a cache hit rate can not be made significantly large.
A dynamic semiconductor memory device comprising a serial shift register having a number of stages equal to the number of columns in the memory cell array and connected to the columns by transfer gates is disclosed in U.S. Pat. No. 4,330,852 entitled "Semiconductor Read/Write Memory Array Having Serial Access", by D. I. Redwine et al., filed Nov. 23, 1979. In this device, data of one row of cells are transmitted in parallel between the shift register and an addressed row of cells. Data in the shift register are serially shifted out of the register to the exterior for a read operation. The device of the prior art comprises a data register which is serially accessed, and thus the device can not be applied to a memory for a cache which requires random access to the columns on an addressed row.
The same device as discussed above is also set forth in a publication entitled "A High Speed Dual Port Memory with Simultaneous Serial and Mode Access for Video Application", by R. Pinkham et al., IEEE Journal of Solid-State Circuits Vol. SC-19, No. 6, Dec. 1984, pp. 999-1007.
A memory device with an on-chip cache is disclosed by Matick et al. in U.S. Pat. No. 4,577,293 entitled "Distributed On-Chip Cache", filed Jan. 1, 1984.
This prior art on-chip cache comprises a cell array and a master-slave register. Accessing the cell array is performed through a first port, while accessing the slave register is performed through a second port. The master-slave register is employed as a cache. However, in this prior art, the master register receives data from the columns connected to an addressed row of the cell array. Therefore, this prior art also has disadvantages such as too large data block size and too small entry number in the latch (tag).