This invention relates in general to cache accesses in a processor, and in specific to a system and method for speculatively accessing data in a cache before verifying a tag hit for such cache.
Computer systems may employ a multi-level hierarchy of memory, with relatively fast, expensive but limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost but higher-capacity memory at the lowest level of the hierarchy. The hierarchy may include a small fast memory called a cache, either physically integrated within a processor or mounted physically close to the processor for speed. The computer system may employ separate instruction caches and data caches. In addition, the computer system may use multiple levels of caches. The use of a cache is generally transparent to a computer program at the instruction level and can thus be added to a computer architecture without changing the instruction set or requiring modification to existing programs.
Computer processors typically include cache for storing data. When executing an instruction that requires access to memory (e.g., read from or write to memory), a processor typically accesses cache in an attempt to satisfy the instruction. Of course, it is desirable to have the cache implemented in a manner that allows the processor to access the cache in an efficient manner. That is, it is desirable to have the cache implemented in a manner such that the processor is capable of accessing the cache (i.e., reading from or writing to the cache) quickly so that the processor may be capable of executing instructions quickly.
Prior art cache designs for computer processors typically require xe2x80x9ccontrol dataxe2x80x9d to be available before a cache data access begins. Such xe2x80x9ccontrol dataxe2x80x9d indicates whether a desired address (i.e., an address required for a memory access request) is contained within the cache. Accordingly, prior art caches are typically implemented in a serial fashion, wherein upon the cache receiving a memory access request, control data is obtained for the request, and thereafter if the control data indicates that the desired address is contained within the cache, the cache""s data array is accessed to satisfy the memory access request.
Thus, prior art cache designs typically generate control data indicating whether a true cache xe2x80x9chitxe2x80x9d has been achieved for a level of cache, and only after a true cache hit has been achieved is the cache data actually accessed to satisfy the memory access request. A true cache xe2x80x9chitxe2x80x9d occurs when a processor requests an item from a cache and the item is actually present in the cache. A cache xe2x80x9cmissxe2x80x9d occurs when a processor requests an item from a cache and the item is not present in the cache. The control data indicating whether a xe2x80x9ctruexe2x80x9d cache hit has been achieved for a level of cache typically comprises a tag match signal. The tag match signal indicates whether a match was made for a requested address in the tags of a cache level. However, such a tag match signal alone does not indicate whether a true cache hit has been achieved.
As an example, in a multi-processor system, a tag match may be achieved for a cache level, but the particular cache line for which the match was achieved may be invalid. For instance, the particular cache line may be invalid because another processor has snooped out that particular cache line. Accordingly, in multi-processor systems a MESI signal is also typically utilized to indicate whether a line in cache is xe2x80x9cModified and Exclusive, Shared, or Invalid.xe2x80x9d Therefore, the control data that indicates whether a true cache hit has been achieved for a level of cache typically comprises a MESI signal, as well as the tag match signal. Only if a tag match is found for a level of cache and the MESI protocol indicates that such tag match is valid, does the control data indicate that a true cache hit has been achieved. In view of the above, in prior art cache designs a determination is first made as to whether a tag match is found for a level of cache, and then a determination is made as to whether the MESI protocol indicates that a tag match is valid. Thereafter, if a determination has been made that a true tag hit has been achieved, access begins to the actual cache data requested.
Typically, in multi-level cache designs, the first level of cache (i.e., L0) is first accessed to determine whether a true cache hit for a memory access request is achieved. If a true cache hit is not achieved for the first level of cache, then a determination is made for the second level of cache (i.e., L1), and so on, until the memory access request is satisfied by a level of cache. If the requested address is not found in any of the cache levels, the processor then sends a request to the system""s main memory in an attempt to satisfy the request. In many processor designs, the time required to access an item for a true cache hit is one of the primary limiters for the clock rate of the processor if the designer is seeking a single-cycle cache access time. In other designs, the cache access time may be multiple cycles, but the performance of a processor can be improved in most cases when the cache access time in cycles is reduced. Therefore, optimization of access time for cache hits is critical for the performance of the computer system.
Turning to FIG. 1, an example of a typical cache design of the prior art is shown. Typically, when an instruction requires access to a particular address, a virtual address is provided from the processor to the cache system. As is well-known in the art, such virtual address typically contains an index field and a virtual page number field. The virtual address is input into a translation look-aside buffer (xe2x80x9cTLBxe2x80x9d) 10. TLB 10 is a common component of modern cache architectures that is well known in the art. TLB 10 provides a translation from the received virtual address to a physical address. Within a computer system, the virtual address space is typically much larger than the physical address space. The physical address space is the actual, physical memory address of a computer system, which includes cache, main memory, a hard drive, and anything else that the computer can access to retrieve data. Thus, for a computer system to be capable of accessing all of the physical address space, a complete physical mapping from virtual addresses to physical addresses is typically provided.
Once the received virtual address is translated into a physical address by the TLB 10, the index field of such physical address is input into the cache level""s tag(s) 12, which may be duplicated N times for N xe2x80x9cwaysxe2x80x9d of associativity. As used herein, the term xe2x80x9cwayxe2x80x9d refers to a partition of the cache. For example, the cache of a system may be partitioned into any number of ways. Caches are commonly partitioned into four ways. The physical address index is also input to the cache level""s data array(s) 16, which may also be duplicated N times for N ways of associativity.
From the cache level""s tag(s) 12, a way tag match signal is generated for each way. The way tag match signal indicates whether a match for the physical address was made within the cache level""s tag(s) 12. As discussed above, in multi-processor systems, a MESI protocol is typically utilized to indicate whether a line in cache is modified and exclusive, shared, or invalid. Accordingly, in such multi-processor systems the MESI protocol is combined with the way tag match signal to indicate whether a xe2x80x9ctruexe2x80x9d tag hit has been achieved for a level of cache. Thus, in multi-processor systems a true tag hit is achieved when both a tag match is found for tag(s) 12 and the MESI protocol indicates that such tag match is a valid match. Accordingly, in FIG. 1, MESI circuitry 14 is utilized to calculate a xe2x80x9ctruexe2x80x9d tag hit signal to determine whether a true tag hit has been achieved for that level of cache. Once it is determined from the MESI 14 that a xe2x80x9ctruexe2x80x9d tag hit has been achieved for that level of cache, then that cache level""s data array(s) 16, which may also be duplicated N times for N ways of associativity, are accessed to satisfy the received memory access request. More specifically, the true tag hit signal may be used to control a multiplexer (xe2x80x9cMUXxe2x80x9d) 18 to select the appropriate data array way to output data to satisfy the received memory access request. The selected data from data array(s) 16 is output to the chip""s core 20, which is the particular execution unit (e.g., an integer execution unit or floating point execution unit) that issued the memory access request to the cache.
In view of the above, prior art caches are typically implemented in a serial fashion, wherein the physical address is first determined, then whether a tag match is achieved for the requested physical address within a particular level of cache is determined, then whether a xe2x80x9ctruexe2x80x9d cache hit is achieved within the particular level of cache is determined, and finally the data array(s) for the particular level of cache are accessed if a xe2x80x9ctruexe2x80x9d cache hit has been achieved. Thus, even though prior art caches determine the physical address relatively early, the cache""s data is not accessed until it has been determined whether a xe2x80x9ctruexe2x80x9d cache hit has been achieved for the cache. Such a serial access of cache data is disadvantageous in that it is slow. Such a serial cache implementation generally adds one to two clock cycles of data access latency because the cache tags and MESI must first complete to determine whether a valid hit has been achieved for the cache before beginning the data access. Thus, such a serial access of cache data requires an undesirably long time to access the cache data. Therefore, serial cache designs of the prior art increase latency in retrieving data from cache, which slows the execution unit within the core of a chip. That is, while an execution unit is awaiting data from cache, it is stalled, which results in a net lower performance for a system""s processor.
In view of the above, a desire exists for a cache design that allows for cache data to be accessed in a timely manner. That is, a desire exists for a cache design that decreases the latency in retrieving data from cache that is present in prior art cache designs. A further desire exists for a cache design that allows for cache data to be accessed in a timely manner, while still verifying that a true tag hit has been achieved for the cache in order to ensure that the appropriate data is being accessed to satisfy a memory access request. Accordingly, a desire exists for a cache design that allows for cache data to be accessed quickly, thereby reducing the number of stalls required in the execution units requesting memory access and enhancing the overall performance the system.
These and other objects, features and technical advantages are achieved by a system and method which provide a cache design that, in response to receiving a memory access request, begins an access to a cache level""s data before a determination has been made as to whether a true hit has been achieved for the cache level. That is, a system and method are provided which enable cache data to be speculatively accessed before a determination is made as to whether a memory address required to satisfy a memory access request is truly present in the cache.
In a preferred embodiment, a cache structure is provided that receives memory access requests from at least one processor of a computer system. In response to receiving such a memory access request, the cache structure begins an access of its data array(s) in an attempt to satisfy the received request, without first determining whether a memory address required to satisfy the received memory access request is truly present in the cache structure. In a most preferred embodiment, such a cache structure is a level of a multi-level cache implemented for a computer system.
In a preferred embodiment, the cache is implemented such that a determination is made as to whether a memory address required to satisfy a received memory access request is truly present in the cache structure. That is, a preferred embodiment determines whether a true cache hit is achieved in the cache structure for a received memory access request. Although, such a determination is not made before the cache data begins to be accessed. Rather, in a preferred embodiment, a determination of whether a true cache hit is achieved in the cache structure is performed in parallel with the access of the cache structure""s data for a received memory access request. That is, a preferred embodiment determines whether a tag match is achieved for the cache structure""s tags and whether a MESI protocol verifies that an achieved tag match is a valid match in parallel with accessing the cache structure""s data array(s). Therefore, rather than the serial path of prior art cache designs in which a tag match is first determined, then a match is verified with a MESI protocol, then the cache data is accessed to satisfy a received memory access request, a preferred embodiment implements a parallel path by beginning the cache data access while a determination is being made as to whether a true cache hit has been achieved. Thus, the cache data is retrieved early from the cache structure and is available in a timely manner for use by a requesting execution unit, once it is determined that a true cache hit has been achieved for the cache structure.
In a preferred embodiment, the data access is begun before determining whether a true cache hit is achieved only for memory access requests that are data read requests. Thus, for xe2x80x9cwritexe2x80x9d requests a preferred embodiment first determines whether a true cache hit is achieved before beginning the requested write.
It should be appreciated that a technical advantage of one aspect of the present invention is that a cache structure is implemented to allow for faster access of the cache structure""s data by beginning the data access before it is determined whether a true cache hit is achieved for the cache structure. Accordingly, the cache design allows for cache data to be accessed in a timely manner. A further technical advantage of one aspect of the present invention is that a cache structure is implemented that allows for cache data to be accessed in a timely manner, while still verifying that a true tag hit has been achieved for the cache in order to ensure that the appropriate data is being accessed to satisfy a received memory access request. Yet a further technical advantage of one aspect of the present invention is that a cache structure is implemented such that the cache structure""s data is accessed in parallel with determining whether a true cache hit is achieved for the cache structure, thereby decreasing the latency in retrieving data from cache that is present in prior art cache designs. That is, a cache structure is implemented that allows for cache data to be accessed quickly, thereby reducing the number of stalls required in the execution units requesting memory access and enhancing the overall performance of the system.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.