1. Field of the Invention
This invention relates to computers, and more particularly, to a cache memory system and a method for accessing this cache memory system, which allow the computer system to operate with high performance even though a low-speed tag RAM is used in the cache memory system.
2. Description of Related Art
In the use of computers, performance is a primary concern. A computer system's performance can be enhanced in various ways, such as by using a high-speed CPU instead of a low-speed one. In the past, the PC/XT-based IBM-compatible personal computers (PC) were driven by a system clock of only 4.77 MHz. Nowadays, however, most IBM-compatible PCs are running at 100 MHz or higher. The use of high-speed CPUs can undoubtedly increase the overall performance of the computer system. However, using a high-speed CPU also requires use of high-speed peripheral devices in conjunction with the high-speed CPU. If a low-speed peripheral device, such as a low-speed memory, is used in conjunction with high-speed CPU, the overall performance of the computer system is still unsatisfactorily low.
A computer system typically includes two types of memories: ROM (read-only memory) and RAM (random-access memory). The ROM is used to permanently store repeatedly used programs and data, such as the booting routines, while the RAM is used to store frequently updated or changed programs and data. ROMs are typically slower in speed than RAMs. Therefore, in operation, the programs stored in the ROM are customarily moved to the RAM after the computer has been booted. This scheme allows an increase in the performance of the computer system. Furthermore, there are two types of RAMs: SRAM (static random-access memory) and DRAM (dynamic random access memory). SRAMs are higher in speed than DRAMs. But since SRAMs are significantly smaller in packing density and more difficult to manufacture than DRAMs, DRAMs are more cost-effective to use than SRAMs. Therefore, although lower in speed, DRAMs are widely used as the primary memory on most computer systems.
Use of a high-speed CPU is used in conjunction with a low-speed DRAM gives rise to the problem of a performance bottleneck. A solution to this problem is to provide a so-called cache memory in addition to the primary memory. In this solution, low-speed DRAMs are used as the primary memory of the computer system, while high-speed SRAMs are used as the cache memory. The cache memory stores the most frequently accessed blocks of programs and data from the primary memory. When requesting data, the CPU first checks whether the requested data are stored in the cache memory; if not, the CPU then turns the request to the primary memory. The use of cache memory can significantly increase the overall performance of the computer system. However, since the cache memory is much smaller in capacity than the primary memory, the requested data may not be always found in the cache memory. It is called a hit if the requested data are currently stored in the cache memory and a miss if not. The term "cache hit rate" refers to the number of times that an operand requested by the CPU is found in the cache memory. Therefore, the cache hit rate is a measure of the performance of a cache algorithm. In the case of IBM-compatible PCs, if the cache memory is larger than 512 KB (kilobyte) in capacity, the cache hit rate can be higher than 90%, which can considerably help improve the overall performance of the computer system. Furthermore, the use of a new type of SRAM, called PBSRAM (pipelined burst static random-access memory), as the cache memory can further increase the overall performance of the computer system.
FIG. 1 is a schematic diagram showing the architecture of a conventional cache memory system used in conjunction with a computer system. The cache memory system here is the part enclosed in a dashed box indicated by the reference numeral 110. As shown, the cache memory system 110 includes a cache memory module 111, which includes a data RAM unit 113 and a tag RAM unit 114, and a cache control circuit 112. All of the constituent components of the cache memory system 110 are coupled via a common data bus 150 to the CPU 120 and the primary memory unit 140 of the computer system for data exchange. The cache control circuit 112 is used to control access to the cache memory module 111 in response to any read/write requests from the CPU 120. When a block of data in the primary memory unit 140 is placed in the cache memory module 111, the data values thereof are stored in the data RAM unit 113 while the tag values used to help map the addresses in the data RAM unit 113 to the primary memory unit 140 are stored in the tag RAM unit 114. Moreover, the tag RAM unit 114 stores a so-called "dirty bit" that is used to indicate whether the data currently stored in the data RAM unit 113 have been updated by the CPU 120.
The scheme for mapping the data and address values from the primary memory unit 140 to the cache memory module 111 is depicted in FIG. 2A. As mentioned earlier, when a block of data in the primary memory unit 140 is placed in the cache memory module 111, the data values thereof are stored in the data RAM unit 113 while the tag values are stored in the tag RAM unit 114. As shown in FIG. 2B, the physical addresses of this block of data can be determined by combining the tag values with the index values. When the CPU 120 references a particular address in the primary memory unit 140, the value of that address can be directly mapped by a direct mapping method to the cache memory module 111 so as to fetch the requested data from the mapped addresses in the cache memory module 111.
To determine whether the request from the CPU is a hit or a miss, the address values issued by the CPU 120 are compared with the contents stored in the tag RAM unit 114. If matched, the requested data are currently stored in the cache memory module 111; otherwise, the requested data are not stored in the cache memory module 111 and access is turned to the primary memory unit 140. The access speed to the tag RAM unit 114 is therefore one of the primary factors that affect the overall performance of the computer system.
FIG. 3 is a flow diagram showing the procedural steps involved in a conventional cache read algorithm for reading data from the cache memory system 110. This algorithm is carried out by the cache control circuit 112 in response to a data read request signal from the CPU 120.
As shown, in the initial step 310, the CPU 120 issues a data read request signal to the cache memory system 110.
In the next step 311, the cache memory system 110 checks whether the data read request signal is a hit or a miss to the cache memory system 110.
If it is a hit, the procedure goes to step 320, in which the requested data are transferred from the cache memory module 111 to the CPU 120.
Otherwise, if it is a miss, the procedure goes to step 313, in which the cache control circuit 112 checks whether the data currently stored in the cache memory module 111 have been updated.
If not updated, the procedure goes to step 332 in which the data requested by the CPU 120 are moved from the primary memory unit 140 to the cache memory module 111, and subsequently transferred from the cache memory module 111 to the CPU 120. This completes the response to the request from the CPU 120.
Otherwise, if updated, the procedure goes to step 330 in which the updated data are moved from the cache memory module 111 to the primary memory unit 140. The procedure then goes on to step 331 in which the data requested by the CPU 120 are moved from the primary memory unit 140 to the cache memory module 111, and subsequently transferred from the cache memory module 111 to the CPU 120. This completes the response to the request from the CPU 120.
FIG. 4 is a flow diagram showing the procedural steps involved in a conventional cache write algorithm for writing data into the cache memory module 111 in the cache memory system 110. This algorithm is carried out by the cache control circuit 112 in response to a data write request signal from the CPU 120.
As shown, in the initial step 410, the CPU 120 issues a data write request signal to the cache memory system 110. The data write request signal indicates that the CPU 120 has generated some new or updated data that are to be added back to the original data.
In the next step 411, the cache memory system 110 checks whether the data write request signal is a hit or a miss to the cache memory system 110.
If it is a miss, the procedure goes to step 430, in which the output data from the CPU 120 are written into the primary memory unit 140.
Otherwise, if it is a hit, the procedure goes to step 420, in which the output data from the CPU 120 are written into the cache memory module 111. The procedure then goes to step 421, in which the dirty bit is set to indicate that the data currently stored in the data RAM unit 113 have been updated.
The conventional cache memory system described performs well, provided that it is used in conjunction with a low-speed CPU. If used in conjunction with a high-speed CPU, such as Intel's 100 MHz P54C CPU, the overall performance of the computer system is still very low. The reason for this is described in the following.
Since Intel's P54C CPU runs at 100 MHz, the period of the clock signal is 10 ns (nanosecond). In the prior art of FIG. 1, when the cache memory system 110 receives a data read/write request signal from the CPU 120, it first checks whether the request is a hit or a miss to the data currently stored in the cache memory module 111. However, the P54C CPU is designed to receive the requested data (in the case of a read request) or output the updated data (in the case of a write request) at the third clock period after the issuing of the request signal. Therefore, the cache memory system 110 should complete the hit/miss checking process in less than three clock periods; i.e., step 311 shown in FIG. 3 and step 411 shown in FIG. 4 should be completed in just one or two clock periods. The conventional cache memory system 110, however, is hardly able to achieve this. The reason is described in the following.
When the cache memory system 110 checks whether the request signal is a hit or a miss, it must first access the data currently stored in the tag RAM unit 114. The access time to the tag RAM unit 114 should therefore be less than two clock periods, i.e. 20 ns. Presently, the RAM products on the market that can be used to serve as the tag RAM unit 114, which comes with 7.2 ns, 8 ns, 10 ns, 12 ns, and 15 ns in access time. According to their nominal specifications, these tag RAM units are all less than 20 ns in access time. However, when actually used on a cache memory system, a number of delay times can be involved, which can add up to an overall access time of greater than 20 ns.
The term "valid delay" refers to the period from the time point when the address bus starts to change voltage states to the time point when the voltage states representative of the output address values are stabilized. In the case of the P54C CPU, the valid delay is 4 ns (which can be found in the operating manual of P54C CPU).
Moreover, it takes a delay time of about 2 ns to transfer the outputted address values from the CPU 120 over the common data bus 150 to the tag RAM unit 114. This delay time is generally proportional to the length of the printed data lines on the motherboard over which the address values are transferred.
In response to the request signal, it then takes a delay time of another 2 ns to transfer the outputted data from the tag RAM unit 114 to the cache control circuit 112.
Next, it takes a setup time of about 3.8 ns for the cache control circuit 112 to wait until the received data from the tag RAM unit 114 are stabilized in voltage states on the data bus.
Still further, although the cache control circuit 112 and the CPU 120 are driven by the same system clock signal, there exists a lag in synchronization that causes the cache control circuit 112 to receive the clock signal by a delay time of about 0.5 ns.
Assume that a fast tag RAM of 8 ns is chosen to serve as the tag RAM unit 114 in the cache memory system 110 of FIG. 1. Then summing up all the above-mentioned delay times, an overall delay time of 20.3 ns is obtained, which is greater than the allowed delay time of 20 ns. If an even faster tag RAM of 7.2 ns is chosen, the overall delay time can be reduced to 19.5 ns. Although this delay time is just a little less than the allowed delay time of 20 ns, it still considerably increases the implementation cost. The overall delay time can be further reduced by reducing the length of the printed data bus over which the data are transferred between the CPU and the cache memory system, but this solution can only provide a slight improvement on the access time which is insubstantial. A better solution is to alter the CPU specification in such a manner as to allow three waiting periods instead of two. This can increase the allowable time for response from 20 ns to 30 ns, thus allowing the use of a low-cost tag RAM with an access time of 10 ns, 12 ns, or 15 ns. However, since one additional waiting period is required, the overall performance of the computer system would be significantly reduced.
As a summary, the conventional cache memory system has the following disadvantages.
(1) First, when it is used in conjunction with a high-speed CPU, it can degrade the overall performance of the computer system in that the access to the tag RAM can cause a long waiting time for the CPU. PA1 (2) Second, the use of a high-speed tag RAM will cause the manufacturing cost of the cache memory system to be high, making the computer system less competitive on the market. PA1 (3) Third, the use of a low-speed tag RAM to save manufacturing cost will then cause the overall performance of the computer system to be low, making the computer system less appealing to the customers.