1. Field of the Invention
This invention relates in general to the field of data storage tracking in computer systems, and more particularly to an improved method and apparatus for using vectors to track and update which alternative way within an aliased storage is the least recently used (LRU).
2. Description of the Related Art
Modern computer systems employ a number of different memory devices and memory architectures to store instructions that will be executed, or data that will be processed. Memory systems typically include a combination of different types of memory devices, each provided for a particular purpose. For example, a hard disk drive is a memory device that provides a large area for permanent storage of instructions and data at a relatively nominal cost. A second type of memory device is a dynamic random-access-memory, or DRAM. DRAM provides temporary storage for instructions and data, and is several orders of magnitude faster than a hard disk drive. However, DRAM is also much more expensive than hard disk memory. A third type of memory device is a static random-access-memory, or SRAM. SRAM provides temporary storage for instructions and data, and is significantly faster than DRAM. However, it is even more expensive than DRAM. Thus, a system designer typically uses a mixture of memory types of varying sizes to obtain differing cost/performance specifications.
A typical memory configuration utilizes a hard disk drive as the primary storage location for instructions and data, and places DRAM between the hard disk drive and the processing system for temporary storage of data. A small portion of SRAM is then usually placed between the processor and the DRAM to store a subset of the data in DRAM.
An exemplary operation of these memory systems is as follows. A processing system requests an instruction/data from the memory system. If the instruction/data is present in SRAM, the instruction/data is provided to the processing system with zero wait states, i.e., the SRAM does not cause the processing system to wait for the instruction/data. If the SRAM does not contain a copy of the requested instruction/data, the processing system must go to the DRAM. If the instruction/data is available in DRAM, the data is provided to the SRAM, along with other instructions/data that will likely be needed in subsequent operations. The SRAM then provides the requested instruction/data to the processing system. In some instances, the requested data is provided directly to the processing system in parallel with providing it to the SRAM. However, if the instructions/data has to be provided to the SRAM from the DRAM, the processing system is often required to wait, or halt processing, until the transfer has been made. Of course, if the requested instructions/data is not in the DRAM, a transfer of the data must first be made from the hard disk drive to the DRAM, and then subsequently to the SRAM. This situation requires the processing system to halt processing for a significant amount of time.
Once the data is in the SRAM, it must be referenced or xe2x80x9ctaggedxe2x80x9d so that it can later be identified as associated with the data in DRAM. The purpose of a tag is to store the address in DRAM of the data that is stored in the associated SRAM location. Then, when a processing system requests data from DRAM, the address of the requested data is compared to the contents of the tags in SRAM, and if a match is found, data in the associated SRAM location is provided to the processing system, without the delay usually associated with data retrieval from DRAM.
In early small cache systems (fully associative), tags often contained enough bits to store the full physical address of the DRAM location that was copied to the associated SRAM location. However, this arrangement required that each tag be compared to a generated address to determine whether a xe2x80x9chitxe2x80x9d in the SRAM occurred. This was too time consuming, and was impractical for larger caches.
Caches of lesser associativity than fully associative caches, termed xe2x80x9cset-associative cachesxe2x80x9d, were introduced to correct this problem and to allow larger caches to be used. Set-associative caches view cache memory the same way the processing system views pages in main memory. Addressable memory is divided into a number of pages, where each page has a number of sets of data contained therein. To address a particular memory location, the page (the physical page number) is determined from the upper bits of the address, and the set (the offset or index) is determined from the lower bits of the address.
By organizing the cache memory to contain the same amount of data as a page in DRAM, each set in the cache had a one to one correspondence with a set on a page in DRAM. The lower address bits of an address could then be used to select a particular set, or tag. The tag could contain the upper address bits that corresponded to the page in DRAM of the data copied into the associated SRAM location. Thus, when an address was generated by a processing system, the lower address bits referenced a particular tag (or set), and the upper address bits were compared to the contents of the referenced tag. Thus, in a set associative cache, determining whether a cache memory contained the desired data only required comparison of the upper bits of the address to a single tag within each set. A four-way set-associative cache, for example, would only require four comparisons, often performed in parallel.
Since cache memory can only store a subset of information/data required by a processor, it is desired to replace the data that will most likely not be used anymore (or in near future) when placing new data into one of the sets. A method that is often used to select which data/set to replace picks the data/set that was least recently accessed/used (LRU).
One method commonly used to track which xe2x80x9csetxe2x80x9d within a cache contains the least recently used data assigns two bits (in a 4 way set associative cache) to each line in each of the cache ways to track the states of the cache lines. That is, in a 4-way cache, there are 4 distinct possibilities, or states, that each cache line can be in. These are typically identified as least recently used (LRU), LRU+1, most recently used (MRU), and MRUxe2x88x921. So, for a 4-way cache, 8-bits are used for each cache line, that identify the states of the cache lines in each way. If there are 64 cache lines in each way, the LRU information is typically stored in an LRU table that is 8 bits by 64 lines, for example. This is particularly described in U.S. Pat. No. 5,765,191 entitled xe2x80x9cMETHOD FOR IMPLEMENTING A FOUR-WAY LEAST RECENTLY USED (LRU) MECHANISM IN HIGH PERFORMANCExe2x80x9d, which is hereby incorporated by reference.
A number of LRU algorithms have been developed to try to reduce the number of bits required to track the states of the cache lines within a multiple way cache. One method, discussed in the above referenced patent, eliminates two bits associated with each line in one of the cache ways, and deduces the state of the cache lines in that cache way from the remaining three cache ways. That is, if the states of corresponding cache lines in the other three ways are known, then the state of the corresponding cache line in the remaining way may be determined. The above referenced patent takes this one step further, and reduces the number of bits per cache line to 5 to properly determine the state of the cache lines in each of the cache ways (in a 4-way set associative cache).
Regardless of the number of bits used to determine the tate of cache lines in a multiple way cache, what has not et been discussed is the need to update each of the xe2x80x9cLRU statexe2x80x9d bits any time a read or a write occurs to a cache line. That is, when a processing system reads data from a cache line in one of the cache ways, the LRU state of that cache line must be marked as MRU. When a corresponding cache line in another way is read from or written to, its state must be changed to MRU, and the state of the previously accessed cache line must be changed to MRUxe2x88x921. Going further, when a corresponding cache line in a third way is accessed, its state must be changed to MRU, and the other two lines must have their states changed to MRUxe2x88x921, and LRU+1, respectively. Finally, when a corresponding cache line in a fourth way is accessed, its state is changed to MRU, and the other three lines have their states changed to MRUxe2x88x921, LRU+1 and LRU, respectively. So, for each read or write to a cache line, the state bits associated with each way must be read, modified, and rewritten to reflect the current LRU state of each of the cache lines.
Requiring the processing system to perform a read-modify-write (r-m-w) procedure to update the LRU table to reflect the current state of cache lines in each way every time a cache is accessed, is both time consuming, and hardware intensive. In some instances, updates of the LRU table may not occur within the same clock cycle as the access to the cache. In other instances, subsequent cache accesses may be delayed until the state of the LRU table is either updated, or at least determined.
What is needed is an apparatus and method that eliminates the need to determine the state of all of the LRU bits associated with a cache line when a line is accessed.
Furthermore, what is needed is a method and apparatus that allows the state of an LRU table to be updated when a cache line is accessed, without requiring a read-modify-write sequence.
What is also needed is a method of updating the state of a cache line when it is accessed, without consideration or regard for the pre-existing states of the corresponding cache lines in the other ways.
For the foregoing reasons, there is a need for an apparatus and method that allows an LRU table to be updated when an entry in a cache way is accessed, without regard to the LRU states of entries in the alternate cache ways.
To address the above-detailed deficiencies, it is an object of the present invention to provide a method for tracking an LRU state for each of a plurality of memory locations (ways or nodes). The method includes: associating vectors between aliased memory locations, each of the vectors having a direction; and when one of the nodes is accessed, updating the direction of its associated vectors to reflect the access. The LRU state for the nodes is determined by the direction of its associated vectors.
Another object of the present invention is to provide an LRU tracking mechanism for aliased entries in alternate cache ways. The mechanism includes a vector table, and vector r/w control. The vector table has a number of vector entries, each entry having a number of vector fields, each vector field associated with one of the aliased entries in the cache ways. The vector r/w control is coupled to the vector table, to update the vector fields associated with an entry in a particular one of the cache ways when the entry is accessed.
Yet another object of the present invention is to provide a method for tracking the LRU state of entries in aliased nodes. The method establishes vectors between each of the entries in each of the aliased nodes, each of the vectors having a direction towards or away from an entry. The method includes determining whether an access is being made to a particular entry in a particular one of the aliased nodes; and if an access is being made, updating the direction of each of the vectors associated with the particular entry to reflect an LRU state after the access.
A further object of the present invention is to provide a computer program product for use with a computing device, the computer program product having a computer usable medium of computer readable program code embodied in the medium for causing an LRU vector tracking apparatus to be described. The computer readable program code includes first program code for providing a vector table, the vector table having vector fields associated with LRU states for corresponding data storage entries; and second program code for providing vector r/w control to update the vector fields when said corresponding data storage entries are accessed.
Additional objects, features and advantages of the invention will be described hereinafter. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention.