1. Field of the Invention
This invention relates to processors and computer systems, and more particularly to address translation memory systems used within computer systems and processors.
2. Description of the Related Art
A typical computer system includes a processor which reads and executes instructions of software programs stored within a memory system. In order to maximize the performance of the processor, the memory system must supply the instructions to the processor such that the processor never waits for needed instructions. There are many different types of memory from which the memory system may be formed, and the cost associated with each type of memory is typically directly proportional to the speed of the memory. Most modern computer systems employ multiple types of memory. Smaller amounts of faster (and more expensive) memory are positioned closer to the processor, and larger amounts of slower (and less expensive) memory are positioned farther from the processor. By keeping the smaller amounts of faster memory filled with instructions (and data) needed by the processor, the speed of the memory system approaches that of the faster memory, while the cost of the memory system approaches that of the less expensive memory.
Most modern computer systems also employ a memory management technique called xe2x80x9cvirtualxe2x80x9d memory which allocates memory to software programs upon request. This automatic memory allocation effectively hides the memory hierarchy described above, making the many different types of memory within a typical memory system (e.g., random access memory, magnetic hard disk storage, etc.) appear as one large memory. Virtual memory also provides for isolation between different programs by allocating different physical memory locations to different programs running concurrently.
A typical modern processor includes a cache memory unit coupled between an execution unit and a bus interface unit. The execution unit executes software instructions. The cache memory unit includes a relatively small amount of memory which can be accessed very quickly. The cache memory unit is used to store instructions and data (i.e. data items) recently used by the execution unit, along with data items which have a high probability of being needed by the execution unit in the near future. Searched first, the cache memory unit makes needed data items readily available to the execution unit. When a needed data item is not found in the cache memory unit, the bus interface unit is used to fetch the needed data item from a main memory unit external to the processor. The overall performance of the processor is improved when needed data items are often found within the cache memory unit, eliminating the need for time-consuming accesses to the main memory unit.
Modern processors (e.g., x86 processors) support a form of virtual memory called xe2x80x9cpagingxe2x80x9d. Paging divides a physical address space, defined by the number of address signals generated by the processor, into fixed-sized blocks of contiguous memory called xe2x80x9cpagesxe2x80x9d. If paging is enabled, a xe2x80x9cvirtualxe2x80x9d address is translated or xe2x80x9cmappedxe2x80x9d to a physical address. For example, in an x86 processor with paging enabled, a paging unit within the processor translates a xe2x80x9clinearxe2x80x9d (i.e., virtual) address produced by a segmentation unit to a physical address. If an accessed page is not located within the main memory unit, paging support constructs (e.g., operating system software) load the accessed page from secondary memory (e.g., magnetic disk) into main memory. In x86 processors, two different tables stored within the main memory unit, namely a page directory and a page table, are used to store information needed by the paging unit to perform the linear-to-physical (i.e., virtual-to-physical) address translations.
In order to reduce the number of required main memory unit accesses to retrieve information from the page directory and page table, a small cache memory system called a translation lookaside buffer (TLB) is typically used to store the most recently used virtual-to-physical address translations. As the amount of time required to access a virtual-to-physical address translation in the TLI is relatively small, overall processor performance is increased as needed address translations are often found in the readily accessible TLB.
In general, processor performance increases with the number of address translations (i.e., entries) in the TLB. When an entry corresponding to an input linear (i.e., virtual) address is found within the TLB, the TLB asserts a xe2x80x9cHITxe2x80x9d signal. As the number of entries in the TLB increases, the time required to generate the HIT signal also increases. Any increase in the time required to generate the ST signal may increase the amount of time which must be allocated to address translation. Address translation may be on a critical timing path within the processor, thus increasing the number of TLB entries beyond a certain number may result in a reduction in processor performance.
Data items from main memory are stored within cache memory units (i.e., xe2x80x9ccachesxe2x80x9d) in groups called xe2x80x9cblocksxe2x80x9d. Cache memory systems are distinguished from one another by where a given data block may be placed within or xe2x80x9cmapped intoxe2x80x9d the caches. In a xe2x80x9cdirect mappedxe2x80x9d cache, there is only one set of locations, collectively referred to as a xe2x80x9clinexe2x80x9d, within the cache where a given block may be placed. In a xe2x80x9cfully associativexe2x80x9d cache, a given block may be placed in any line within the cache. In a xe2x80x9cset associativexe2x80x9d cache, a given block can only be placed in one of a restricted set of lines within the cache.
When a needed data item is not found within the cache, a new block containing the data item must be fetched from main memory and placed within a line of the cache. If all of the lines where the block may be placed (i.e., xe2x80x9ccandidatexe2x80x9d lines) are filled with valid data, one of the candidate lines must be removed from the cache to make room for the new block. In the case of a direct-mapped cache, there is only one candidate line, and this line must be removed from the cache to make room for the block. In a fully associative or set-associative cache, there are multiple candidate lines. A replacement xe2x80x9cpolicyxe2x80x9d or xe2x80x9cstrategyxe2x80x9d is used to select the candidate line to be removed from the cache in order to make room for the new block.
Common cache line replacement policies include random, least recently used (LRU), and first in first out (FIFO). In a random replacement strategy, one of the candidate lines is randomly selected for replacement. The LRU replacement strategy involves replacing the candidate line which has remained xe2x80x9cunusedxe2x80x9d for the longest period of time. A candidate line is referred to as xe2x80x9cunusedxe2x80x9d when a needed data item is not found within the candidate line. The FIFO replacement strategy replaces the candidate line which has been stored in the cache for the longest period of time.
For some cache sizes and configurations, employing the LRU replacement strategy may result in a greater number of needed data items being found within the cache. A TLB is one form of cache memory, therefore it would thus be desirable to have a TLB which implements an LRU replacement strategy.
The problems outlined above are in large part solved by a memory unit (e.g., a translation lookaside buffer or TLB) employing a least recently used (LRU) replacement strategy. The memory unit may include a memory subunit for storing data items, circuitry coupled to the memory subunit for determining if the memory subunit contains a needed data item, and a control unit for controlling the storing of data items within the memory subunit. The memory subunit may include, for example, n entry locations for storing data items where nxe2x89xa72. The memory unit may generate a first signal indicating which of the n entry locations are currently in use (i.e., contain valid data items), and the circuitry coupled to the memory subunit may produce a second signal indicating which of the n entry locations contains the needed data item. When a needed data item is not found within the memory subunit, the data item may be obtained from another source and provided to the memory subunit as a new data item. The new data item may be accompanied by a control signal identifying which of the n entry locations is to be used to store the new data item.
The control unit may receive the first and second signals and produce the control signal dependent upon the first and second signals, The control signal may identify either: (i) one of the n entry locations not currently in use, or (ii) a least recently used one of the n entry locations. The least recently used one of the n entry locations is the entry location in which a needed data item has not been found for the longest period of time. If the first signal indicates that at least one of the n entry locations is not currently in use, the control signal may identify one of the n entry locations not currently in use. On the other hand, if all of the n entry locations are in use, the control signal may indicate the least recently used one of the n entry locations.
Each of the n entry locations may be identified by a unique identifier. For example, each of the n entry locations may be assigned a different number. The control unit may maintain a list of the unique identifiers of the n entry locations in chronological order of needed data items being found within each of the n entry locations. The control unit may maintain the list dependent upon the second signal, and use the list to determine the least recently used one of the n entry locations of the memory subunit.
In one embodiment of the memory unit, the memory subunit described above is a first memory subunit. The control unit includes a second memory subunit having n entry locations, where each of the n entry locations of the second memory subunit stores an identifier uniquely identifying a different one of the n entry locations of the first memory subunit. For example, each of the n entry locations of the first memory subunit may be assigned a different number, and each of the n entry locations of the second memory subunit may store a number assigned to a different one of the n entry locations of the first memory subunit. The control unit adjusts the relative locations of the identifiers within the n entry locations of the second memory subunit dependent upon the second signal such that the identifiers are maintained in chronological order of needed data items being found within each of the n entry locations of the first memory subunit. When a new data item is to be stored within the first memory subunit, and the first signal indicates that at least one of the n entry locations of the first memory subunit is not currently in use, the control signal identifies one of the n entry locations of the first memory subunit not currently in use. If, however, all of the n entry locations of the first memory subunit are in use, the control signal indicates the least recently used one of the n entry locations of the first memory subunit.
In one embodiment, the memory unit may be a translation lookaside buffer (TLB). The TLB may be used to store at least portions of virtual addresses and at least portions of physical addresses corresponding to the virtual addresses. The TLB may receive a virtual address and produce a physical address corresponding to the virtual address, thereby translating a virtual address to the corresponding physical address. The virtual address may include a higher-ordered xe2x80x9cvirtual page numberxe2x80x9d portion and a lower-ordered xe2x80x9coffsetxe2x80x9d portion. The TLB may use stored data to produce a xe2x80x9ctranslatedxe2x80x9d portion of a physical address from the virtual page number portion of a virtual address. The TLB may then append the offset (i.e., xe2x80x9cuntranslatedxe2x80x9d) portion to the translated portion of the physical address in order to produce the physical address corresponding to the virtual address.
The TLB may include a first memory unit having a tag array, a data array, and a valid bit array. The tag array may have n entry locations for storing a b-bit virtual page number portion of a virtual address, where 2xe2x89xa6n less than 2b. The b-bit virtual page number portion of a virtual address may be the highest-ordered b bits of the virtual address. In one specific example, n may be equal to 32 and b may be equal to 20. The data array may have n entry locations for storing a translated portion of a physical address, wherein each of the n entry locations of the data array is associated with a different one of the n entry locations of the tag array. Thus the first memory unit may have n lines each including a different tag array entry location and the associated data array entry location. The valid bit array may be used to store n valid bits, wherein each of the n valid bits is associated with a different one of the n lines of the first memory unit and has a value indicating if the contents of the associated line is valid. Thus each of the n valid bits has a value indicating if the contents of the associated entry location of the tag array and the corresponding contents of the data array are valid. The first memory unit may produce a first signal including the values of the n valid bits of the valid bit array.
The first memory unit may receive a new data item and a control signal. The new data item may include a virtual page number portion of a virtual address and a corresponding translated portion of a physical address. The control signal may identify one of the n lines of the first memory unit in which the new data item is to be stored. Thus the control signal may identify one of the n entry locations of the tag array in which the virtual page number portion of the virtual address is to be stored. The translated portion of a physical address of the new data item is to be stored in the data array entry location associated with the identified tag array entry location.
The TLB may also include circuitry coupled to the first memory unit for determining if the first memory unit contains a needed translated portion of a physical address. The circuitry may produce a second signal indicating which of the n entry locations of the tag array is associated with the entry location of the data array containing the needed translated portion of the physical address.
The TLB may also include a control unit. The control of the TLB may receive the first and second signals, and may produce the control signal dependent upon the first and second signals. The control unit may include a second memory unit including n entry locations, each storing an identifier uniquely identifying a different one of the n lines of the first memory unit (i.e., a different one of the n tag array entry location/data array entry location combinations). For example, each of the n lines of the first memory unit may be assigned a different number, and each of the n entry locations of the second memory unit may store a number assigned to a different one of the n lines of the first memory unit. The control unit may adjust the relative locations of the identifiers within the n entry locations of the second memory unit dependent upon the second signal such that the identifiers are maintained in chronological order of needed translated portions of physical addresses being found within each of the n lines of the first memory unit (i.e., within the entry location of the data array associated with each of the n entry locations of the tag array).
When a new data item is to be stored within the first memory unit and the first signal indicates that at least one of the n lines of the first memory unit is not currently in use (i.e., does not contain valid data), the control signal identifies one of the n lines of the first memory unit not currently in use as the line in which the new data item is to be stored. The control signal thus identifies one of the n entry locations of the tag array not currently in use as the entry location of the tag array in which the virtual page number portion of the virtual address of the new data item is to be stored. The translated portion of a physical address of the new data item is to be stored in the data array entry location associated with the identified tag array entry location.
When a new data item is to be stored within the first memory unit and the first signal indicates that all of the n lines of the first memory unit are in use, the control signal indicates a least recently used line of the first memory unit as the line in which the new data item is to be stored. The least recently line is the line in which a needed translated portion of a physical address has not been found for the longest period of time. The least recently line includes a least recently used entry location of the tag array and an associated least recently used entry location of the data array. The least recently used entry location of the data array is the entry location of the data array in which a needed translated portion of a physical address has not been found for the longest period of time. The control signal thus indicates the least recently used entry location of the tag array as the entry location of the tag array in which the virtual page number portion of the virtual address of the new data item is to be stored. The translated portion of the physical address of the new data item is to be stored in the least recently used data array entry location.
The control unit may include least recently used (LRU) logic, invalid entry locator logic, and selection logic. The LRU logic may be coupled to the second memory unit, and may receive the second signal. The LRU logic may adjust the relative locations of the identifiers within the n entry locations of the second memory unit dependent upon the second signal, and may produce an LRUE signal indicating the least recently used line of the first memory unit. The LRUE signal thus indicates the least recently used line within the first memory unit (i.e., a least recently used tag array entry location and a corresponding least recently used data array entry location). The invalid entry locator logic may receive the first signal and produce: (i) an EE signal indicating the presence or absence of at least one of the n lines within the first memory unit not currently in use, and (ii) an FIE signal identifying one of the n lines not currently in use. The selection logic may receive the LRUE, EE, and FIE signals, and produce either the LRUE signal or the FIE signal as the control signal dependent upon the EE signal. For example, the EE signal may be asserted if at least one of the n entry locations of the tag array is not currently in use. The selection logic may produce the LRUE signal when the EE signal is deasserted, and may produce the FIE signal when the EE signal is asserted.
The TLB described above may be dual ported, and may include a first port for receiving a first virtual address and a second port for receiving a second virtual address. The circuitry coupled to the memory unit may be a first set of circuitry for determining if the memory unit contains a needed translated portion of a physical address corresponding to the first virtual address. The first set of circuitry may produce the second signal, wherein the second signal indicates which of the n lines of the first memory unit contains the needed translated portion of the physical address corresponding to the first virtual address (i.e., which of the n entry locations of the tag array is associated with the data array set containing the needed translated portion of the physical address corresponding to the first virtual address).
The dual port TLB may also include a second set of circuitry coupled to the memory unit for determining if the memory unit contains a needed translated portion of a physical address corresponding to the second virtual address. The second set of circuitry may produce a third signal indicating which of the n lines of the first memory unit contains the needed translated portion of the physical address corresponding to the second virtual address (i.e., which of the n entry locations of the tag array is associated with the data array set containing the needed translated portion of the physical address corresponding to the second virtual address).
The control unit may receive the first, second, and third signals, and may produce the control signal dependent upon the first, second, and third signals. The control unit may include the second memory unit described above, and may adjust the relative locations of the identifiers within the n entry locations of the second memory unit dependent upon the second and third signals such that the identifiers are maintained in chronological order of needed translated portions of physical addresses being found within each of the n lines of the first memory unit (i.e., within the data array entry location associated with each of the n tag array entry locations).
As described above, when a new data item is provided to the memory unit and the first signal indicates that at least one of the n lines of the first memory unit is not currently in use, the control signal produced by the control unit identifies one of the n lines of the first memory unit not currently in use as the line in which the new data item is to be stored. As described above, the control signal thus identifies one of the n entry locations of the tag array not currently in use as the entry location of the tag array in which the tag portion of the partial virtual address of the new data item is to be stored. The translated portion of the physical address of the new data item is to be stored within the data array entry location associated with the identified tag array entry location.
If, on the other hand, all of the n entry locations of the tag array are in use, the control signal indicates a least recently used one of the n lines of the first memory unit (i.e., the line of the first memory unit in which a needed translated portion of a physical address has not been found for the longest period of time) as the line in which the new data item is to be stored. The control signal thus indicates a least recently used entry location of the tag array as the entry location of the tag array in which the tag portion of the partial virtual address of the new data item is to be stored. The translated portion of the physical address of the new data item is to be stored in the least recently used entry data array location associated with the least recently tag array entry location.
A cache unit may include a cache memory coupled to a TLB implementation of the memory unit described above. The cache unit may store multiple data items, and may be configured to produce a stored data item when provided with a virtual address corresponding to a physical address of the data item. The cache memory may be used to store the data items and corresponding physical addresses, and may be configured to produce one of the data items when provided with the corresponding physical address of the data item. The TLB may receive the virtual address, and may produce the physical address corresponding to the virtual address and provide the physical address to the cache memory.
A processor may include the cache unit described above, and a computer system may include such a processor. The computer system may also include a bus coupled to the processor, and a peripheral device coupled to the bus. For example, the bus may be a peripheral component interconnect (PCI) bus. In this case, the peripheral device may be, for example, a network interface card, a video accelerator, an audio card, a hard disk drive, or a floppy disk drive. Alternately, the bus may be an extended industry standard architecture (EISA)/industry standard architecture (ISA) bus, and the peripheral device may be, for example, a modem, a sound card, or a data acquisition card.