1. Field of the Invention
The present invention generally relates to semiconductor memory devices and particularly to a construction and an operation method of a semiconductor memory device containing a cache memory of a simple structure in which a cache hit rate is improved without increasing the number of pin terminals.
2. Description of the Prior Art
A computer system generally comprises a central processing unit (CPU) for execution of instructions received and the like, and a main memory for storage of data, programs and the like necessary for the CPU. It is desirable to operate the CPU at high speed with no wait from the viewpoint of improvement of the performance of the system. For this purpose, it is necessary to reduce time of access to the main memory to a value as short as possible so that it can correspond to an operation speed of the CPU. These days, a clock cycle of the CPU tends to be made as fast a cycle as 16 MHz or 20 MHz, which unavoidably requires reduction of the access time with respect to the main memory. However, this requirement comes to surpass the limits of the performance of a DRAM (Dynamic Random Access Memory) used in the main memory. To cope with this, a high-speed memory is required; however, it is expensive and not desirable from a viewpoint of cost performance. One of the methods for solving this difficulty is a cache memory system in which memories are arranged in a hierarchy manner. In this system, a DRAM (,or DRAMs) which has a large storage capacity with a low operating speed and therefore is inexpensive is used as a main memory, and a small-capacity but high-speed buffer memory is provided between the CPU and the low-speed main memory. Frequently used data in the main memory is stored in the high-speed buffer memory in accordance with a request from the CPU. In response to accessing from the CPU, the requested data is read from and written into the high-speed buffer memory in place of the main memory. This high-speed buffer memory is called a cache memory. A state in which data of an address to be accessed by the CPU exists in the cache memory is called "hit", and in this case, the CPU accesses the high-speed cache memory. On the other hand, a state in which data of an address to be accessed by the CPU does not exist is called miss. In this case, the CPU accesses the low-speed main memory and also transfers a block to which the requested data belongs from the main memory to the cache memory. The cache memory stores the transferred block of data to be ready for the subsequent accessing from the CPU.
As described above, the cache memory does not store data in a fixed manner. An area of the main memory, which is stored in the cache memory changes dependent on a request from the CPU. However, the area of the main memory accessed from the CPU has locality in a data processing. Accordingly, data fetched from the main memory to be stored in the cache memory is likely to be accessed for a while thereafter. Consequently, once data of the main memory is stored in the cache memory, the function of the high-speed memory is fulfilled most effectively and there is no wait in memory accessing by the CPU. In other words, processing operation of the CPU is never delayed due to a memory access time.
Thus, the high-speed cache memory is provide as a buffer between the low-speed and large-capacity main memory and the high-speed CPU and accordingly it is made possible to improve the system performance and the cost performance. However, the above described cache memory system requires a high-speed memory which is of small capacity but is expensive. For this reason, the cache memory system cannot be applied to a small-sized system, which is desired to have a low cost.
Therefore, in a conventional small-sized system, a simplified cache system is formed by utilizing a page mode and a static column mode which are high-speed access modes of a general-purpose DRAM.
Referring first to FIG. 1, a construction of a DRAM having a high-speed access mode will be described. The DRAM comprises a memory cell array 5 where memory cells for storing information are arranged in a matrix of rows and columns. The rows of the memory cell array 5 are defined by word lines WL while the columns of the array 5 are defined by bit lines BL. FIG. 1 typically shows one word line WL, one bit line BL and a memory cell MC located at an intersection of those lines. A sense amplifier 6 is provided corresponding to columns of the memory cell array 5, to detect, amplify and latch a signal potential appearing on a bit line concerned when a word line is selected.
A row address buffer 1, a row decoder 3 and a word driver 4 are provided to select memory cells of one row of the memory cell array 5. The row address buffer 1 accepts an externally applied row address in response to a control signal RAS and generates an internal row address RA. The row decoder 3 decodes the internal row address RA from the row address buffer 1 and designates one word line. The word driver 4 responds to a row address decode signal from the row decoder 3 and activates the word line designated by the decode signal.
A column address buffer 2, a column decoder 8 and an input/output (I/O) switch 7 are provided to select memory cells of one column out of the memory cell array 5. The column address buffer 2 accepts an externally applied column address in response to a control signal CAS to generate an internal column address CA. The column decoder 8 decodes the internal column address CA from the column address buffer 2 and generates a signal for selecting a column designated by the column address. The I/O switch 7 responds to a column address decode signal from the column decoder 7 to connect the column (a bit line) designated by the decode signal to an I/O bus 13.
To input and output data, there are provided an input buffer 14 for generating internal data upon receipt of external input data D.sub.IN and supplying the internal data to the I/O bus 13, and an output buffer 15 for generating external data D.sub.OUT upon receipt of memory cell information selected by row and column addresses through the I/O bus 13.
In order to control data input/output operation of the memory, there is provided a read/write (R/W) control 16 for controlling a data input/output buffer 14 and a data output buffer 15 in response to a write enable signal WE and the signal CAS.
A row address and a column address as external addresses are multiplexed and supplied through the same pins to the DRAM. The control signal RAS provides operation timing to circuits related with the row address. When the signal RAS is activated, a memory cycle is started. The signal CAS provides operation timing to circuits related with selection of a column. This signal also provides timing for writing of data dependent on an operation mode selected. Referring now to FIGS. 2 to 4 which are waveform diagrams showing operation of the DRAM, the operation of the DRAM will be described.
Referring first to FIG. 2, a normal operation cycle of the DRAM will be described. When the signal RAS falls to "L" (low) level, a memory cycle is started. At a falling edge of the signal RAS, an externally applied row address is accepted in the DRAM chip and an internal row address RA is generated from the row address buffer 1 and is supplied to the row decoder 3. The row address is decoded in the row decoder 3, whereby one word line is selected to be activated through the word driver 4. As a result, information stored in one row of memory cells connected to the selected word line is transmitted onto the respective bit lines (columns). The information on the respective bit lines is detected, amplified and latched by the sense amplifiers 6. On the other hand, when the signal CAS falls, an externally applied column address is accepted by the column address buffer 2 and an internal column address CA is generated. The column decoder 8 decodes the internal column address CA and selects a column designated by the column address. The I/O switch 7 connects the column (the bit line) selected by the column decode signal from the column decoder 7 to the I/O bus 13. As a result, information stored in the selected memory cell and amplified and latched by the sense amplifier 6 is outputted through the output buffer 15 as an external data D.sub.OUT.
Thus, in the normal operation cycle, a row address is accepted in the chip at a falling edge of the signal RAS and then a column address is accepted in the chip at a falling edge of the signal CAS. After that, data of the memory cell selected by the row address RA and the column address CA is outputted. Accordingly, RAS access time T.sub.RAC shown in FIG. 2 is required as the access time (namely, a period from the fall of the signal RAS to the output of valid data). A cycle time Tc is a sum of a period in which the DRAM is active (namely, a period of "L" level of the signal RAS) and a RAS precharge period (namely, a period of "H" (high) level of the signal RAS, in which the device is in a state). An average value of the cycle time Tc is about 200 ns in the DRAM with T.sub.RAC =100 ns.
Referring now to FIG. 3, page mode operation will be described. First, a row address and a column address are provided in the same manner as in the normal operation cycle, whereby information of a selected memory cell is read out through the output buffer 15. Then, the signal CAS is raised to "H" level with the signal RAS being maintained at "L" level. As a result, circuits related with column selection, such as the column address buffer 2 and the column decoder 8, are reset. On the other hand, the sense amplifiers 6 is latching information of memory cells of one row selected by the row address RA because the signal RAS is at "L" level. Then, when a column address is provided and a signal CAS falls to "L" level, a column (a bit line) corresponding to the newly supplied column address is selected and information on the column selected by the column decoder 8 and the I/O switch 7 is read out through the I/O bus 13 and the output buffer 15. Operation of accepting a new column address for each toggle of the signal CAS is permitted to be repeated by any number of times within a period in which the signal RAS is allowed to be maintained at "L" level. In short, the page mode operation is operation for accessing memory cells connected in the same row by changing only the column address. Since only the column address is changed, it is not necessary to accept a row address for each accessing and thus accessing operation can be performed at a higher speed than that in the normal operation cycle.
Referring to FIG. 4, a static column mode will be described. In the static column mode, the first accessing is performed in the same manner as in the normal operation cycle. Thus, a row address and a column address are accepted in the chip in response to the signals RAS and CAS, respectively, and information of a memory cell selected is read out. Then, valid data is read out and after an elapse of a predetermined period, the column
address is changed with the signals RAS and CAS being maintained at "L" level. As a result, information of a memory cell corresponding to a new column address out of the memory cells of the same row is read out. Although in this operation mode, the signal RAS is maintained at "L" level and information of the memory cells of one row designated by the initially supplied row address is latched by the sense amplifiers. Thus, the static column mode is also a mode in which the memory cells connected in the same row are accessed by changing only the column address, as in the page mode. However, in the same manner as in the case of a static RAM, the signal CAS is maintained at "L" level (corresponding to a signal CS in a static RAM) and access is made only by changing a column address. Accordingly, it is not necessary to toggle the signal CAS and thus access can be made generally at a higher speed than that in the page mode.
An access time T.sub.CAC in the page mode (namely, a period from a fall of the signal CAS to an output of a valid data) and an access time T.sub.AA in the static column mode (namely, a period from a change of the column address to an output of valid data, that is, an address access time), both are about a half of the RAS access time T.sub.RAC in the normal operation mode. For example, if T.sub.RAC =100 ns, both T.sub.CAC and T.sub.AA are about 50 ns. In addition, the cycle time is also shortened and in the case of the page mode, the cycle time is about 50 ns as in the static column mode although it depends on a value of the CAS precharge time Tcp.
Now, high-speed accessing operation of the DRAM will be briefly described with reference to FIG. 1.
As shown in FIG. 1, a multiplexed row and column addresses are supplied to the row address buffer 1 and the column address buffer 2, respectively. When the signal RAS falls to "L" level, an internal row address RA is supplied from the row address buffer 1 to the row decoder 3 in response to the falling edge thereof, so that the internal address RA is decoded. The word driver 4 is driven by the decoded row address from the row decoder 3, thereby activating one word line in the memory cell array 5 selected by the internal row address RA. As a result, data of the respective memory cells connected to the selected (activated) word line appear on the related bit lines to be transmitted to the sense amplifiers 6. The sense amplifiers 6 detect, amplify and latch the data supplied thereto. Thus, at this time, data on one row corresponding to the designated row address are latched by the sense amplifiers 6. Thereafter, if data in a memory cell on the same row is accessed by the row address, the above described page mode and static column mode can be utilized.
More specifically, in the page mode, the column address buffer 2 transmits the column address supplied thereto to the column decoder 8 in response to a falling edge of the signal CAS. As a result, one of the data latched by the sense amplifier 6 (in the case of .times. 1-bit structure) is selected by the decoded address and provided as output data D.sub.OUT through the output buffer 8.
In the static column mode, a trigger of column (bit line) selection is given by a change in the multiplexed address MXA, namely, a transition in the column address supplied to the column address buffer 2. Other operation is the same as in the page mode.
A description of static column mode operation and a description of a cache system using DRAMs operable in the static column mode are given in "The Use of Static Column RAM as a Memory Hierarchy" by J. G. Goodman et al, IEEE 11th Annual Symposium on Computer Architecture, 1984 pp. 167-174.
Page mode operation and ripple mode/static column mode operation as well as a cache system using DRAMs operable in those modes are described in an Application Note on 256K CMOS DRAM of Intel Corp., pp. 1-276 to 1-279.
Referring to FIG. 5, description is now made of construction and operation of a simple cache memory system using a fast access mode such as the above described page mode or static column mode.
A main memory system shown in FIG. 5 comprises eight DRAMs 22-1 to 22-8 each capable of performing fast serial access operation. Each of the DRAMs 22-1 to 22-8 has a 1M.times.1b structure. More specifically, each of the DRAMs 22-1 to 22-8 has a capacity of 1 mega bits (2.sup.20 bits) and data is inputted and outputted on 1 bit basis. Consequently, the main memory system has a 1M byte structure. An identical address is multiplexed and supplied to the respective DRAMs 22-1 to 22-8. Accordingly, an address of 10 bits is supplied to each DRAM.
In order to control access to the main memory, there are provided an address generator 17, a latch (TAG) 18, a comparator 19, a state machine 20 and an address multiplexer 21.
The address generator 17 generates an address of data required by the CPU, in response to address information from the CPU (not shown). If the main memory system is of the 1M byte structure, addresses of 20 bits (namely, a row address of 10 bits and a column address of 10 bits) are simultaneously transmitted onto a 20-bit address bus 40.
The latch (TAG) 18 receives the addresses from the generator 17 and stores the row address selected in the preceding cycle. The row address stored by the latch (TAG) 18 is not updated at the time of hit in the cache memory (hereinafter referred to as "cache hit"). It is updated by a row address newly generated by the address generator 17 at the time of miss in the cache memory (hereinafter referred to as "cache miss").
The comparator 19 compares the row address from the address generator 17 with the row address stored in the latch (TAG) 18 and generates a signal CH (cache hit) indicating the result of comparison. The signal CH is supplied to the latch (TAG) 18. Thus, updating of a content stored in the latch (TAG) 18 is controlled. The signal CH is also supplied to the state machine 20.
The state machine 20 generates control signals RAS, CAS and WE in response to the signal CH and supplies those signals to the respective DRAMs 22-1 to 22-8. The signal WE is a signal for designating input and output of data to and from the main memory system. Data is read out at "H" level of the signal WE and data is written at "L" level thereof. The signal WE is supplied to the data input buffer and the data output buffer of each DRAM. Data is written in response to the later falling of the signals CAS and WE. When the signal CH from the comparator 19 indicates a mismatch (a cache mishit), the state machine 20 temporarily raises the signals RAS and CAS to "H" level and then lowers those signals sequentially, whereby each DRAM executes normal operation cycle. At the same time, the state machine 20 supplies a signal WAIT to the CPU to bring the CPU into a wait state. When the signal CH indicates a match (a cache hit), the state machine 20 maintains the signal RAS at "L" level and toggles the signal CAS, so that each DRAM performs page mode operation.
The address multiplexer 21 multiplexes the addresses from the address generator 17 and transmits the multiplexed addresses onto the 10-bit address bus 41 to supply the same to the respective DRAMs 22-1 to 22-8 under control of the state machine 20. When the signal CH indicates a mismatch, the address multiplexer 21 multiplexes the address of 20 bits supplied from the address generator 17 and generates a row address of 10 bits and a column address of 10 bits successively under control of the state machine 20. When the signal CH indicates a match, only a column address of 10 bits out of the addresses supplied is generated under control of the state machine 20.
Referring now to FIG. 6 indicating an operation waveform diagram, operation of the cache memory system shown in FIG. 5 will be described. The system clock shown in FIG. 6 is a clock for applying operation timing to the memory system and the CPU, and one machine cycle is defined by one clock.
According to procedures of a program, the CPU generates address information of necessary data. The address generator 17 generates, in response thereto, an address showing a location of storage of the data required by the CPU, at a rise of the system clock and supplies the address to the 20-bit address bus 40. The comparator 19 compares the 10-bit row address (RA2) out of the generated addresses with the row address (RA1) stored by the latch (TAG) 19. When those addresses match (RA1=RA2), which means that the same row as that related with the memory cells accessed in the preceding cycle has been accessed, the comparator 19 generates the signal CH of "H" level, for example, indicating a cache hit. The state machine 20 toggles the signal CAS in response to the signal CH of "H" level from the comparator 19 with the signal RAS being maintained at "L" level (till then, the signal RAS is at "L" level and each DRAM is enabled). On the other hand, the address multiplexer 21 transmits the 10-bit column address to the 10-bit address bus 41 under control of the state machine 20 when the signal CH is generated. As a result, the respective DRAMs 22-1 to 22-8 perform page mode operation and provide data at high speed to the CPU in the access time T.sub.CAC. (Input and output of the data are instructed by the signal WE, whose instruction is given by the CPU and provided through the state machine 20.)
On the other hand, the row address (RA1) stored by the latch (TAG) 18 does not match with the row address (RA2) generated by the address generator 17, the comparator 19 does not generate the signal CH (or keeps the signal CH at "L" level). In this case, since the memory cells of a row different from that accessed in the preceding cycle are accessed, it is necessary to newly supply a row address to the respective DRAMs 22-1 to 22-8. When the signal CH is not generated, the state machine 20 brings the signals RAS and CAS temporarily into an inactive state at "H" level, so that the respective DRAMs 22-1 to 22-8 can execute the normal operation cycle. The address multiplexer 21 multiplexes the 20-bit address from the address generator 17 and transmits the row address and the column address successively by 10 bits to the address bus 41 under control of the state machine 20. The respective DRAMs 22-1 to 22-8 accept the row address at a fall of the signal RAS to select one word line and accepts the column address at a fall of the signal CAS to select one column, whereby information of the selected memory cell is outputted.
Thus, in the case of cache miss, the normal operation cycle beginning with RAS precharging is executed. The minimum value of the RAS precharge period is predetermined and the succeeding operation cycle can not be started before the elapse of the RAS precharge period. In addition, the access time until valid data is outputted is T.sub.RAC at low speed. Since this time T.sub.RAC is longer than one operation cycle time of the CPU, the state machine 20 supplies a signal WAIT to the CPU to bring it into a wait state. In the case of a cache miss, the latch (TAG) 18 stores a new row address on the address bus 40 and holds it. Control as to whether the stored content in the latch (TAG) 18 is to be changed or not is made by the signal CH.
In the above described construction, the latch (TAG) 18 stores a row address, and a match or a mismatch between the stored row address and a row address to be newly accessed is determined. In other words, in the conventional simple cache memory system, data for one row of a DRAM (1024 bits in the case of a 1M device) is formed as one block and a cache hit or a cache miss with respect to this data block is determined.
However, there is not a high probability that all the data of one block (1024 bits of one row for each DRAM in the above described prior art) are continuously accessed by the CPU. Therefore, the block size (namely, 1024 bits/DRAM) is unnecessarily large.
In addition, in the construction utilizing the page mode or the static column mode as in the above described prior art, the latch (TAG) 18 holds only one block (entry) and the capacity can not be further increased. Consequently, the cache hit rate can not be sufficiently increased. In other words, a cache hit occurs only in the case where the same row address is continuously accessed. Accordingly, if a program routine related with two consecutive row addresses is repeatedly executed, a cache miss always occurs and thus the function of the cache memory can not be satisfactorily performed.
A dynamic semiconductor memory device comprising a serial shift register having a number of stages equal to the number of columns in the memory cell array and connected to the columns through transfer gates is disclosed in U.S. Pat. No. 4,330,852 entitled "Semiconductor Read/Write Memory Array Having Serial Access", issued to D. J. Redwine et al, filed Nov. 23, 1973. In this device, data of cells of one row are transmitted in parallel between the shift register and an addressed row of memory cells. Data in the shift register are serially shifted from the register to external for a read operation. The device of the prior art comprises a data register which is serially accessed, and thus the device can not be employed as a cache memory which requires random access to the column on an addressed row.
The same device as discussed above is also described in a publication entitled "A High Speed Dual Port Memory with Simultaneous Serial and Random Mode Access for Video Application" by R. Pinkham et al, IEEE Journal of Solid-State Circuits Vol. A sc-19, No. 6, December 1984, pp. 999-1007.
A memory device with on-chip cache is disclosed by Matick et al. in U.S. Pat. No. 4,577,273 entitled "Distributed On-Chip Cache", filed Jan. 1, 1984. This prior art on-chip cache comprises a cell array and a master-slave register. The cell array is accessed through a first port, while the slave register is accessed through a second port. The master-slave register is employed as a cache. However, in this prior art, the master register receives data from the columns connected to an addressed row of the cell array. Therefore, this prior art also has disadvantages such as too large a data block size and too small entry number in the latch (TAG).