This invention relates in general to digital computers and, more particularly, to cache memories for such computers.
Modern computers typically include one or more processors for performing the calculations and logical operations generally associated with such machines. Instructions which are to be executed by the processor are stored in a main memory. When a program is run or executed on a computer, its instructions are called out of the main memory and sent to the processor where they are executed. This process takes valuable time.
It is known that providing a processor cache memory for use by such processors is one way to effectively accelerate the pace at which instructions are executed by the processor. Such a cache memory is a relatively small memory when compared with the size of the main memory. However, this cache memory exhibits a much faster access time than the access time associated with the main memory. The cache memory thus provides relatively quick access to instructions and data which are the most frequently used.
For example, in a typical personal computer application, the main memory may consist of 1-64 Mbytes or more of relatively slow (80 nsec access time) dynamic random access memory (DRAM). However, the cache memory associated with a microprocessor may consist of typically 8 Kbytes to 256 Kbytes or more of fast (20 nsec access time) static random access memory (SRAM). Computers are not designed with a main memory consisting entirely of fast SRAM because SRAM is extremely expensive when compared with DRAM. When instructions and data are called up from the main memory by the microprocessor for execution, they are also stored in the relatively small high speed cache. Thus, the microprocessor has ready access to the most recently executed instructions and data should these instructions and data be needed again by the microprocessor. When the microprocessor needs to use an instruction or data a second time, rather than initiating a relatively slow memory cycle to the slow main memory to retrieve that information, instead the microprocessor quickly accesses the information from the high speed processor cache.
Some microprocessors such as the Intel 80386 are designed to use a local processor cache closely coupled to the local bus of the 80386 microprocessor by a cache controller such as the Intel 82385 cache controller. Other microprocessors such as the Intel 80486 employ a small 8K cache integrated within the microprocessor chip itself. Still other microprocessors such as the Motorola 68040 include dual caches within the microprocessor, one cache being used for code caching and the other cache being used for data caching. For simplicity, both lines of code and lines of data will be referred to as instructions.
Clearly, it is important to keep track of precisely which lines of code and data are stored in the cache. One technique is to use a cache which includes TAGs to help identify a request for an instruction or data which is presently located within the cache. The cache includes memory locations for storing TAG addresses which correspond to addresses of the particular information presently stored in the cache. The microprocessor generates a request for an instruction or data in the form of an address. This information is stored in main memory but might also be stored in the cache due to recent prior use. The TAGs are used to determine if the address generated by the microprocessor is one for which the cache contains the needed information. To accomplish this, the address generated by the microprocessor is compared to the TAG addresses. If the address generated in the request from the microprocessor matches a TAG address, then the cache contains the requested information, a TAG hit occurs, and the microprocessor obtains the requested information directly from the local processor cache. However, if the address generated by the microprocessor fails to match any TAG address, then a TAG miss occurs. In this case, the requested information is not contained in the local processor cache and a memory cycle to the main memory must be generated to obtain the requested information therefrom. The memory cycle takes valuable time.
As the microprocessor processes instructions and data over time, the contents of the cache change. The most frequently used address may also change. For this reason, the situation arises where the cache may be full of information which is recently used and valid; however, that information may not correspond to the information which is being frequently used.
To address this problem, least recently used (LRU) logic has been created to help keep the information in the cache current as well as being valid. To do this, LRU logic keeps track of those cache address locations which are least recently used. When a cache miss occurs (no TAG address match was found), a main memory access results. The main memory then provides the requested information to the microprocessor at which time the cache also stores this information and the corresponding TAG address at one of the TAG locations in the cache. The LRU logic determines which particular TAG location in the cache should be overwritten with the most recent address requested and which resulted from the cache miss. The TAG location where the replacement occurs is the TAG location which the LRU logic determined contained the least recently used TAG address.
At one extreme, a cache may have fixed addresses. In the case where the cache addresses are so fixed, there is no necessity for keeping track of which TAG address is least frequently used. The information itself which corresponds to the fixed TAG address is the only thing which can be updated. In this situation, testing to determine whether or not a cache hit occurs for a particular TAG address is straightforward because the TAG addresses are hard-wired. At the other end of the spectrum is a cache arrangement wherein any TAG location can have any address generated by the microprocessor. In this situation, the determination of a TAG address hit requires reading all of the TAG addresses stored in the cache and testing each TAG address to see if a match occurs between the stored TAG address and the, particular address requested by the microprocessor. The latter type of cache is referred to as a "fully associative cache".
It is helpful at this point to briefly review how microprocessors execute instructions stored in main memory. In a computing system, the processor obtains its instructions from a sequence of instruction words which are located in main memory. The sequence of instructions are placed in memory in an orderly fashion one after another at respective sequential addresses. The sequence of instructions is executed by the processor in a serial mode, taking one instruction after another in address order, until either a branch instruction jumps to a new section of code stored elsewhere in main memory or until a "call" or "interrupt" instruction temporarily jumps to a new section of code. Later process flow continues back to the point in the code from which the call or interrupt occurred and execution of subsequent instructions continues.
When a branch or call is executed, processing must stop until the new instructions are fetched from memory. Any time spent by the microprocessor waiting during this time period is critical to the effective execution speed of the microprocessor. Modern microprocessors include a memory prefetch for the next line of code. In this manner, the instruction buffer of the microprocessor is kept full. However, this does not reduce the system overhead time spent accessing main memory during branch and call return instructions, namely, the problem discussed earlier. Unfortunately, increasing the prefetch line size can overload the data bus and cause delays for other work.
More specifically, the instruction queue for the microprocessor can be viewed as a pipe. As the pipe starts to become empty, the microprocessor performs a code pre-fetch to the next sequential code execution address in main memory to refill the pipe. When a branch instruction is encountered, an out of sequence memory access is requested as the place from which to start filling the pipe. The microprocessor's prefetch ability is mainly used to prevent stalls during which the microprocessor would have to wait in the idle state until new code is available.
A code/data cache between the main memory and the microprocessor as in the processor cache discussed above helps ameliorate the problems discussed above to some degree. However, difficulty still exists in filling the processor cache quickly enough with new code and code returns without overloading the bus by gathering many extra code sequences that will not be used.