FIG. 1 is an illustration of a general purpose computer 10. The computer 10 includes a central processing unit (CPU) 12. The CPU 12 includes execution units 14, such as a floating point unit, an integer execution unit, etc.
The instruction unit 16 of the CPU 12 reads instructions from a computer program. Each instruction is located at a memory address. Similarly, the data associated with an instruction is located at a memory address. The CPU 12 accesses a specified memory address to fetch the instruction or data stored there.
Most CPUs include an on-board memory called an internal cache 18. The internal cache 18 of FIG. 1 includes a data cache 20, an instruction cache (not shown), and a load/store unit 22. The load/store unit 22 includes a load buffer 24, a store buffer 26, and a priority control circuit in the form of a write-after-read (WAR)/read-after-write (RAW) circuit. These elements will be discussed below. Attention presently focuses on a more general discussion of the processing of addresses associated with instructions or data of a computer program executed by CPU 12.
If a specified address is not in the internal, or L1, cache 18, then the CPU 12 looks for the specified address in an external cache, also called an L2 cache 30. The external cache 30 has an associated external cache controller 28, which may be internal to the CPU 12 or external to the CPU 12.
If the address is not in the external cache 30 (a cache miss), then the primary memory 34 is searched for the address. The primary memory 34 has an associated primary memory controller 32, which may be internal to the CPU 12 or external to the CPU 12.
If the address is not in primary memory 34, then the CPU 12 requests access to the system bus 36. The system bus 36 is used to access secondary memory 40 through an input/output controller 38.
The foregoing discussion of a typical memory hierarchy in a computer makes apparent the desirability of finding an address within the internal cache 18. In the absence of a "hit" in the internal cache, the address must be located in the external cache 30. This operation takes more time. Similarly, if the address cannot be found in the external cache 30, then it must be sought in the primary memory 34. This operation is also time consuming. Finally, if the address cannot be found in primary memory 34, then it must be secured from secondary memory 40. Access to secondary memory 40 is particularly time consuming. Therefore, it is highly desirable to have a high "hit" rate in the internal cache 18 of a CPU 12.
FIG. 1 also depicts additional devices connected to the system bus 36. For example, FIG. 1 illustrates an input/output controller 42 operating as an interface between a graphics device 44 and the system bus 36. In addition, the figure illustrates an input/output controller 46 operating as an interface between a network connection circuit 44 and the system bus 36.
The overall computer architecture associated with the present invention has now been described in the context of a similar prior art device. In addition, the prior art problem of fetching memory addresses has been described. The foregoing discussion has also set forth the importance of the performance of the internal cache 18 to the overall performance of the CPU 12. Attention presently turns to a more particular discussion of the operation of the prior art internal cache 18.
As known in the art, cache memories store addresses and corresponding instructions or data that are expected to be used by the CPU execution units 14 in the near future. For example, if the CPU execution units 14 perform a calculation, the result of the calculation will generally be expected to be used again in the near future. Thus, the result of that calculation, referred to here as data, will be stored at a specified memory address. The CPU execution units 14 will attempt to store the data in the data cache 20 of the internal cache 18. If space does not exist in the data cache 20, the data is written to the external cache 30.
As indicated above, the internal cache 18 has a load/store unit 22, with a load buffer 24 and a store buffer 26. As known in the art, the load buffer may be used to store prefetch load commands. In other words, in the instruction unit 16, using techniques known in the art, may generate a prefetch load command, which is enqueued in the load buffer 24. The prefetch load command is then used to fetch the data corresponding to the address specified in the prefetch load command. The data is loaded into the data cache 20 in the anticipation that it will be subsequently used by the CPU execution units 14. The present invention is directed toward prior art circuit architectures of the type wherein the load buffer 24 is capable of allocating or reserving a data line in the data cache 20.
The store buffer 26 of the load/store unit 22 is used to store addresses and data that are to be returned to the primary memory 34, the external cache 30, or the internal cache 18, if a hit. Given this functionality, the prior art store buffers 26 of relevance to the present invention are non-allocating. That is, unlike the allocating load buffer 24, the store buffer 26 does not allocate a line in the data cache 20.
The operation of the load buffer 24 and store buffer 26 must be supervised by a priority control circuit 27. The priority control circuit 27 is implemented as write-after-read/read-after-write (WAR/RAW) circuit 27. The WAR/RAW circuit 27 operates to preserve the processing priority of entries within the load/store unit 22. For example, a load to the same address as a previous store must be prevented from executing first.
The priority control circuit issues two types of hazards to insure proper program execution. The first type of hazard is called a read-after-write (RAW) hazard. A RAW hazard is used on load buffer 24 entries so that the load buffer 24 does not read data from the data cache 20 or primary memory 34 until the store buffer 26 writes data to the data cache 20 or primary memory 34. After this write operation by the store buffer 26, the load buffer 24 can perform its read operation.
The second type of hazard is called a write-after-read (WAR) hazard. A WAR hazard is placed on a store buffer 26 entry so that the store buffer 26 does not write data to the data cache 20, the external cache 30, or primary memory 34 until the load buffer 24 reads from the data cache 20, the external cache 30, or primary memory 34. This read operation allocates a line in the data cache 20 on misses. After the read operation by the load buffer 24, the store buffer 26 can perform its write operation.
Complex priority control circuits 27 exist in the prior art to implement these WAR/RAW operations. It would be highly desirable to use, without modification, these complex prior art priority control circuits 27 in different circuit architectures. For example, if the external cache 30 of the system of FIG. 1 was eliminated, then an alternate mechanism would be required to return data to memory in the event of a data cache write miss. One approach to solving this problem would be to supply a copy-back data cache. It would be desirable to use an existing architecture to implement the copy-back data cache. However, as indicated above, a problem with the architecture of the apparatus of FIG. 1 is that the store buffer 26 is non-allocating. That is, it does not allocate into the data cache 20. To implement a copy-back data cache, the store buffer can be made allocating. However, this would require that the priority control circuit 27 be re-designed. As indicated above, the priority control circuit is rather complex, and therefore it would be desirable to identify a technique of using the circuit without modification.
In sum, it would be highly desirable to exploit an allocating load buffer, non-allocating store buffer, and priority control circuit of a prior art load store unit to implement a copy-back data cache. The copy-back data cache could be used in a low-cost system without an external cache.