Computing systems use a variety of techniques to improve performance and throughput. One technique is known in the art as multiprocessing. In multiprocessing, multiple processors perform tasks in parallel to increase throughput of the overall system.
A variation of multiprocessing is known in the art as multithreading. In multithreading, multiple logical processors, which may comprise a single physical processor or multiple physical processors, perform tasks concurrently. These tasks may or may not cooperate with each other or share common data. Multithreading may be useful for increasing throughput by permitting useful work to be performed during otherwise latent periods, in which the performance level of the overall system might suffer.
Another technique to improve performance and throughput is known in the art as pipelining. A pipelined processor performs a portion of one small task or processor instruction in parallel with a portion of another small task or processor instruction. Since processor instructions commonly include similar sequences of component operations, pipelining has the effect of reducing the average duration required to complete an instruction by working on component operations of multiple instructions in parallel.
One such component operation is a translation from virtual addresses to physical addresses. This operation is often performed by using a translation lookaside buffer (TLB). It is a function of the TLB to permit access to high-speed storage devices, often referred to as caches, by quickly translating a virtual address from a task, software process or thread of execution into a physical storage address.
In systems which permit multiprocessing, including those systems that permit multithreading, identical virtual addresses from two different threads or software processes may translate into two different physical addresses. On the other hand, multiple threads or software processes may share a common address space, in which case some identical virtual addresses may translate into identical physical addresses. To prevent mistakes in accessing high-speed storage, the data may be stored according to physical addresses instead of virtual addresses.
If a high-speed storage device is accessed by multiple logical processors, the size of the TLB may be increased to allow storage of virtual address translations for each logical processor or thread of execution. Unfortunately, the time required to perform a virtual address translation increases with the size of the TLB, thereby reducing access speed and overall system performance. Alternatively, smaller faster TLBs may be physically duplicated for each logical processor, but physically duplicating these hardware structures may be expensive. Furthermore, in cases where multiple threads or software processes share a common address space, the TLB entries may include duplicates of some virtual address translations, thereby wasting space in this expensive resource. Providing private TLBs in a multithreaded processor, therefore, inefficiently uses this resource and prevents the logical processors from sharing translations when they share code or data. The inability to share translations is particularly harmful to the performance of multithreaded software, such as a database, wherein the logical processors often run threads that share a single address space. Sharing TLBs allows logical processors to dynamically partition the available resources based on the run-time needs of each processor and share translations, leading to more efficient use of the resource.