1. Field of the Invention
This invention relates to microprocessors and more particularly, to cache replacement schemes within microprocessor caches.
2. Description of the Relevant Art
Superscalar microprocessors are capable of attaining performance characteristics which surpass those of conventional scalar processors by allowing the concurrent execution of multiple instructions. As a result of executing multiple instructions per clock cycle, a superscalar processor""s performance is heavily impacted by the processor""s ability to quickly move instructions from memory to the execution units within the processor that actually execute the instructions. Since main system memory is typically designed for density rather than speed, microprocessor designers have added caches to their designs to reduce the microprocessor""s need to directly access main memory. A cache is a small memory that is more quickly accessible than the main memory. Computer systems may have a number of different levels of caches. For example, a computer system may have a xe2x80x9clevel onexe2x80x9d cache that is internal to the microprocessor (i.e., on-chip), and a xe2x80x9clevel twoxe2x80x9d cache that is external to the microprocessor. Caches are typically constructed of fast memory cells such as static random access memories (SRAMs) which have faster access times and bandwidth than the memories used for the main system memory (typically dynamic random access memories (DRAMs) or synchronous dynamic random access memories (SDRAMs)). The faster SRAMs are not typically used for main system memory because of their lower density and corresponding higher cost. Other types of caching are also possible. For example, the main system memory may act as a cache for the system""s slower direct access storage devices (e.g., hard disk drives).
When microprocessors need data from memory, they typically first check their level one cache to see the if the required data has been cached. If not, the data is requested from memory. If the second level cache is storing the data, it provided the data to the microprocessor (typically at much higher rate than the main system memory is capable of). If the data is not cached in the first or second level caches (referred to as a xe2x80x9ccache missxe2x80x9d), the data is read from main system memory or some type of mass storage device (e.g., a hard disk drive). Relative to accessing the data from the level one cache, accesses to memory take many more clock cycles.
Caches typically operate on the principal of locality of reference, which states that the data most recently used (and the data in that locality) is more likely to be accessed than the rest of the data in general. This principle holds because computer software tends to be somewhat linear in execution and typically has loops and branches that cause previously executed code to be re-executed. By storing recently accessed instructions and data in a cache, system performance may be increased because the microprocessor need not wait for the instructions and data to read from main memory.
Microprocessor and computer system architects have taken this principle one step further by using techniques such as branch prediction to proactively store instructions and data in the cache before they are actually needed by the microprocessor. In addition, when an instruction or byte of data is read from memory, additional bytes following the instruction or data are read and cached. Once again, the principal of locality of reference dictates that these instruction and data bytes are more likely to be needed by the processor than the other data or instructions at large.
Since cache size is limited by a number of factors (including die size, power consumption, and cost), care must be taken when loading information into the cache. Once particular area of concern for the designer is when to overwrite or invalidate existing instructions and data in a cache to make room for new instructions and data. A common solution is to track the frequency of accesses and then replace the least recently used instructions or data with new instructions or data. Other solutions include random replacement, and first-in first-out techniques. While these techniques are all effective to a certain extent, none of them are able to take advantage of the underlying structure of the program being executed.
Given the heavy penalty associated with cache misses, a more accurate method for determining which instructions and data should be cached and which instructions and data should be overwritten is needed.
The problems outlined above may at least in part be solved by a method and apparatus that allows programmers to prioritize instructions and/or data with respect to caching and cache replacement. Advantageously, this may improve performance by reducing cache misses. Furthermore, encoding cache replacement information may also caches of a particular size to be utilized more efficiently, thereby freeing precious die space for other purposes (e.g., more functional/execution units). Depending upon the exact implementation, the method may be applied to both instruction and data caches.
A microprocessor configured to utilize cache priority information is contemplated. In one embodiment, the microprocessor comprises a cache, a predecode unit, and a cache controller. The cache may comprise a plurality of storage locations (referred to as cache lines), each storing a predetermined number of instruction and/or data bytes. The predecode unit is configured to receive/predecode instructions bytes and detect the presence of cache replacement priority prefix bytes or opcodes. This information may then be routed to the cache""s control logic (i.e., the cache controller). The cache controller may then store this information along with other predecode information and the actual instruction bytes in the cache. The cache control logic is configured to utilize the stored cache replacement priority information to determine which instruction bytes stored in the cache should be overwritten. Note, as used herein the term xe2x80x9coverwrittenxe2x80x9d may mean flushed, invalidated, or written back to memory, depending upon the type of cache being used. In addition to a microprocessor, a computer system capable of executing code that contains cache replacement information is also contemplated.
Note, as used herein, a xe2x80x9chigh priority instructionxe2x80x9d is an instruction that will likely receive preferable treatment during the cache line replacement process. When a cache is full and new data is read from memory, the cache""s control logic must determine where to store the new information (i.e., which of the old cache lines to overwrite). The present invention allows cache replacement priority (CRP) data regarding which bytes in the cache should or should not be overwritten to be stored. For example, if a programmer knows that a particular subroutine will be called repeatedly through a program, then the programmer may indicated to the compiler that the instructions forming the subroutine should be marked as high priority instructions. The compiler may do this in a number of different ways, depending upon the implementation. In one embodiment, special prefix bytes are added to the high priority code. These prefix bytes may be detected by the prefetch/predecode unit of the microprocessor executing the program. The prefetch/predecode unit may then signal the microprocessor""s cache that the instructions should be given preferential treatment during the cache line replacement process (i.e., instructions marked low priority or those not marked high priority should be overwritten before the high priority instructions are overwritten).
In some embodiments, the compiler may also allow the programmer to indicate one or more points in the program after which the high priority instructions shall be converted to normal or low priority instructions. For example, in a computer graphics program, there may be one portion of the program that is high priority at the beginning of execution, e.g., a subroutine that processes a user""s keyboard input. This routine may be repeated a number of times. However, once the user selects a particular function (e.g., rendering a three-dimensional image), the programmer may want to make all of the microprocessor""s resources available to the rendering routine. Thus the programmer may indicate that once the rendering subroutine is invoked, the subroutine that processes the user""s keyboard input should be reclassified as normal or low priority (with the rendering subroutine subsequently being classified as high priority). indicate that once the rendering subroutine is invoked, the subroutine that processes the user""s keyboard input should be reclassified as normal or low priority (with the rendering subroutine subsequently being classified as high priority).
A computer software program embodied on a computer readable medium (e.g., diskette, CD-ROM, RAM, or ROM) and configured to compile source code into object code that includes cache replacement priority information is also contemplated. In some embodiments, the compiler may be configured to translate source code instructions into object code instructions, determine which source code instructions are high priority for caching, and encode cache priority information into the object code. The cache priority information may be encoded as special prefix bytes, special opcode bytes, or as a separate data file. In some embodiments the computer program may be configured to receive input from the user indicating which portions of the source or object code are high priority. The computer program may also be configured to receive input from the user indicating at which points of the source or object code instructions designated as high priority should be reclassified as normal or low priority.
In some embodiments, the computer software program may be configured to operate somewhat autonomously, by (i) tracking the execution of the object code, (ii) identifying portions of the object code that are repeated the greatest number of times, and (iii) embedding cache replacement priority information in the object.