The technical field is computer microarchitectures using translation lookaside buffers.
In modern computer memory architectures, a central processing unit (CPU) produces virtual addresses that are translated by a combination of hardware and software to physical addresses. The physical addresses are used to access a main memory. A group of virtual addresses may be dynamically assigned to a page. Virtual memory requires a data structure, sometimes called a page table, that translates the virtual address to the physical address. To reduce address translation time, computers may use a specialized associative cache dedicated to address location, called a translation lookaside buffer (TLB).
In some instances, a desired address translation (virtual-to-physical address mapping) entry may be missing from the TLB. The computer system could then generate a fault, invoke the operating system, and install the desired TLB entry using software.
Alternatively, a hardware page walker in the CPU may be used to install a missing TLB entry for a memory reference. If the TLB does not contain a virtual-to-physical address mapping that applies to a memory reference, the hardware page walker is invoked to find the desired mapping in a page table stored in main memory. The new mapping (if found) is installed in the TLB. Then, the original memory reference re-accesses the TLB and uses the newly-installed mapping to generate the appropriate physical address. The hardware page walker has the advantage of generally being faster than generating a fault and installing a new mapping using operating system software.
Challenges in the design of a hardware page walker include bus arbitration and wiring complexity, especially when multiple TLBs (e.g., an instruction TLB and a data TLB) must access the hardware page walker. Since each of the multiple TLBs can operate independently, requests from each of the TLBs may occur close to or simultaneous with other requests or with responses from the hardware page walker. In addition, virtual address requests from each TLB to the hardware page walker, and virtual and physical addresses together with permissions and attributes from the hardware page walker back to each TLB, can total hundreds of signals.
One existing solution connects two TLBs and the hardware page walker using one long, bidirectional bus, with a central arbiter to prevent bus contention. Each of the TLBs and the hardware page walker can both drive data onto and receive data from the bidirectional bus. However, the bidirectional bus is up to twice as long as separate busses from the hardware page walker to each TLB would need to be. And, since the bus is bidirectional, large drivers are required with each unit (i.e., with each TLB and the hardware page walker). This arrangement may also require large and complex bidirectional repeaters between units, and may lead to a large capacitance due to the large drivers and repeaters. As a result, this arrangement may be unacceptably slow.
Another current solution connects the hardware page walker to both TLBs using separate unidirectional busses, but does not include the central arbiter. In this solution, each unit is allowed to drive information onto its outgoing data bus at any time. However, since the hardware page walker can receive information from both TLBs simultaneously, the complexity and circuitry needed to receive and handle the information is increased.
To overcome limitations inherent in existing solutions, an arbitration logic block (or arbiter) is provided. The arbiter simplifies the complexity within the hardware page walker and makes multiple-state data transfer easy to implement. Each unit (i.e., the hardware page walker and a data TLB and an instruction TLB) has a unidirectional bus that the unit always drives, and the arbiter informs the hardware page walker (and the driving units) which of the busses has been selected to be enabled during any given clock cycle. Thus, the hardware page walker can receive only one command per clock cycle, and needs no extra logic to handle multiple requests at a time. The arbiter also simplifies the TLBs, because a TLB will never receive an incoming bus command at the same time as it is driving an outgoing bus command.
The arbiter also supports transfers that require multiple clock cycles to complete. If a unit that requires a multiple-cycle transfer is selected for bus ownership, that unit is guaranteed to have its bus grant maintained for enough consecutive clock cycles to complete a data transfer. The arbiter can also support different numbers of transfer cycles for each unit. This allows each unit to have a very simple bus interface. The unit requests the bus and begins driving the first cycle of data for its transfer. The unit continues to drive the first cycle of data, if necessary, until the unit receives a bus grant. The unit then proceeds to drive successive cycles of data.