1. Field of the Invention
This invention relates in general to the field of data coherency within cache memory, and more particularly to a method and apparatus for sharing a cache line within a split data/code cache.
2. Description of the Related Art
Modern computer systems employ a number of different memory devices and memory architectures to store instructions that will be executed, and/or data that will be processed. The types of devices used, and the manner in which they are coupled, vary from system to system, depending on a designer's criteria. To illustrate a common manner of employing different memory devices, reference is now made to FIG. 1.
In FIG. 1, a computer system 100 is shown. The computer system 100 includes a microprocessor 102, connected to a main memory system 106 via a system bus 116. The microprocessor 102 also includes an on-chip level 1 cache 104 for temporarily storing a portion of the data in the main memory 106. Also connected to the main memory 106 via the system bus 116 are a plurality of bus master devices 110, 112, and 114, and a look-aside level 2 cache 108.
In operation, the microprocessor 102 fetches instructions from the main memory 106, executes the instructions, and reads data from, or writes data to, the main memory 106. The main memory 106 is also connected to a secondary storage device, such as a hard disk drive (not shown) which provides permanent storage for instructions and data.
The main memory 106 provides the primary storage for instructions and data that will be utilized by the microprocessor 102. The main memory 106 is typically provided by a plurality of DRAM (dynamic random access memory) chips, which are relatively inexpensive, but are slow in comparison to the speed of the microprocessor 102.
Closer to the microprocessor 102 is the level 2 cache 108. The level 2 cache 108 provides significantly less storage than the main system memory 106, and is much more expensive than the main memory 106. However, it provides a significant performance advantage over the main memory 106 because it responds to read and write operations much faster than the main memory 106.
Within the microprocessor 102 is the level 1 cache 104. The level 1 cache 104 stores even less data than the level 2 cache 108. However, because it is located within the microprocessor 102, it can respond to read and write requests much faster than can the level 2 cache 108, or the main memory 106. In addition, if read or write requests by the microprocessor 102 can be responded to by the level 1 cache 104, the processor 102 does not have to access the system bus 116. This improves the performance of the microprocessor 102, as well as that of the entire computer system 100. The performance of the computer system 100 is improved because other devices, such as the bus master devices 110-114, can utilize the system bus 116 while the microprocessor 102 is executing instructions using the level 1 cache 104. One skilled in the art should readily appreciate the advantages of using the three level memory system discussed above with respect to the computer system 100, particularly the use of intermediate caches, and on-chip caches. However, for a more complete background on the use of cache memory, the reader is referred to Chapter 14, ISA SYSTEM ARCHITECTURE, by Shanley/Anderson, 3.sup.rd edition, which is hereby incorporated by reference.
An improvement in memory architecture, especially on-chip cache architecture, that has recently been developed, is the separation of caches within a microprocessor, one dedicated to storing data, the other dedicated to storing code. An example of this improvement is shown in FIG. 2, to which reference is now made.
In FIG. 2, a computer system 200 is shown. The computer system 200 includes a microprocessor 202 connected to a main memory 206, and a look-aside level 2 cache 208, via a system bus 216. Other devices may be connected to the system bus 216, but have not been shown. Within the microprocessor 202 is a level 1 cache 204. The level 1 cache 204 is connected to the system bus 216, and thus to the main memory 206 and the level 2 cache 208, via a bus unit 220.
Within the level 1 cache 204 is a split code cache 205 and data cache 207. The code cache 205 responds to instruction fetch requests by a control unit 222. The data cache 207 responds to read and write requests by the ALU 224 by storing data from, or writing data to, a register file 226.
By splitting the level 1 cache 204 into separate code and data caches 205, 207, respectively, instruction fetches by the control unit 222 no longer need to wait on read requests by the ALU 224 to complete before instructions can be delivered to the control unit 222. To illustrate this point, presume that the level 1 cache 204 is unified, rather than split. In addition, presume the microprocessor 202 is executing an instruction that moves data from a memory location in the main memory 206 into a register within the register file 226. If the cache 204 is not split, the control unit 222 is essentially stalled from fetching the next instruction, until the data has been retrieved from the main memory 206, and provided to the level 1 cache 204. Separating the code and data caches 205, 207 within the level 1 cache 204 eliminates this internal contention problem. The use of a split data/code cache, as described above, has been incorporated into modern x86 microprocessors.
However, when the level 1 cache 204 is split into separate data and code portions, another problem is created, to which the present invention is addressed. The problem is best illustrated by reference to FIG. 3.
In FIG. 3, a computer system 300 is shown. The computer system 300 is essentially like that described above in FIG. 2. Like elements have like numbers, with the hundreds digit replaced with a 3.
At any given time, a particular byte of data (or code) may be simultaneously stored in the main memory 306, the level 2 cache 308, and the level 1 cache 304. If the data is stored within the level 1 cache, it can be provided to the control unit 322, or utilized by the register file 326, without requiring the microprocessor 302 to access the system bus 316.
This byte of data is stored within what is commonly referred to as a cache line. A cache line is typically composed of a number of bytes of data that can be read or written, from or to the main memory, in back to back transfers. Shown in FIG. 3 is a cache line 330 that stores 32 bytes of data. The cache line 330 is retrieved from the main memory 306 locations a,b,c,d 332, in four back to back transfers of eight bytes each, and placed into the level 2 cache 308, and into the level 1 cache 304. Future requests by the microprocessor 302 to any of the memory locations a,b,c,d 332 may then be fulfilled by the cache line 330 within the level 1 cache.
However, prior to the present invention, the memory locations 332 could only be stored in the cache line 330 within either the code cache 305, or the data cache 307, but not in both at the same time. When the cache line 330 is required, first by the control unit 322 (for instruction fetches), and second by the ALU 324 (for data reads and writes), the level 1 cache 304 first retrieves the cache line 330 into the code cache 305 over the system bus 316. Later, the cache line 330 is retrieved into the data cache 307 over the system bus 316, and the cache line 330 in the code cache 305 is invalidated. If the control unit 322 again requires access to the cache line 330, another fetch is made by the level 1 cache 304, over the system bus 316, to place the cache line 330 into the code cache 305. In addition, the cache line 330 in the data cache 307 is invalidated, since the cache line 330 can only reside in the code cache 305, or the data cache 307, but not in both. This back and forth contention for the cache line 330, which forces the microprocessor 302 to retrieve a cache line from the main memory 306, and invalidate the cache line 330 in the competing cache, is known as thrashing. Thrashing significantly effects the performance of a computer system, because it requires a microprocessor to halt processing every time a cache line has to be retrieved over the system bus.
Thrashing is common in a level 1 cache when code modification software is executing, particularly when the code that is being modified is in the same cache line as the code that is performing the modification. It is also common in situations where a program is processing data, and the data that is being processed resides within the same cache line as the program. Both of these situations are very common and will be discussed further in the Detailed Description below.
However, what should be clear from the above is that an invention is needed that allows a cache line to coexist in both a code cache, and in a data cache, for as long as possible, to allow fetches of code, and reads of data, within the same cache line, to be handled entirely within an on-chip split data/code cache, without requiring a microprocessor to retrieve the cache line over the system bus.