1. Technical Field
The present invention relates in general to a method for effective page number (EPN) to real page number (RPN) translation in processors. Specifically, the present invention relates to a method for optimizing EPN to RPN translation when a data miss occurs.
2. Description of Related Art
Processor-generated memory accesses require address translation before they go out to the memory subsystem. In present day computing, it is common to have a process executing only in main, or “physical,” memory, while the user perceives a much larger “virtual” memory which is allocated on an external disk. To address the virtual memory, many processors contain a translator to translate virtual addresses, or effective page numbers (EPN), in virtual memory to physical addresses, or real page numbers (RPN), in physical memory, and a translation look-aside buffer (TLB), which caches recently generated virtual-physical address pairs, or page table entries (PTE). A group of eight PTEs is called a page table entry group (PTEG).
Most processors have a load store unit (LSU). There are usually one or more arrays in the LSU that serve as a data effective to real address translation (D-ERAT) location. These locations hold pairs of linked EPNs and RPNs. When the instruction decoding unit (IDU) issues an instruction, the real address is looked up in the D-ERAT. Usually, if the RPN is missing from the D-ERAT, the TLB will check the recently accessed PTEGs and find the missing address. Therefore, the PTEs must be checked to find the missing RPN. The PTEs are checked to see if the abbreviated virtual page number (AVPN) and page attributes matches the AVPN and page attributes of the EPN associated with the missing RPN. Once a match is found, the RPN from the matching PTE is installed in the D-ERAT.
However, not all processors have TLBs. For example, the International Business Machines p-series p6 processor chip design does not have a TLB. Therefore, when a D-ERAT miss occurs, the PTEG must be reloaded from the level two cache memory (L2). This has negative impact on performance and overhead as the current instruction is paused until the missing RPN is found. In a best-case scenario, the 128 bit PTEG reload, at a 32 byte data width, from the core to the L2 would take four nest clocks, which is equivalent to eight processor clocks, during which the eight PTEs are analyzed for a match. Once a match is found, the PTE's RPN data is then installed in the D-ERAT, and the next-to-complete instruction is restarted.
Even for processors with TLBs, the TLBs can miss as well. In such a case, the usual process is to reload all of the PTEGs first into the TLB and then look up the missing address from the TLB. In some instances, this can take more than 100 processor cycles and can cause code to run as much as thirty times slower than normal.
Therefore, in order to mitigate the impact on performance, it would be advantageous to have an improved method for EPN to RPN translation and resumption of the execution stream.