As the number of transistors available on a die has been increasing in accordance with Moore's Law and the development of general purpose processors features a long execution pipeline and a superscalar execution core, the efficiency of transistor utilization in mainstream processor architectures has been decreasing. At some point, given increased transistor counts, increases in processor performance will level out and only minor improvements will be possible. This expectation has stimulated much research. One way to improve performance has been by developing multiprocessor architectures, including single-chip multiprocessors (CMP). Another way to improve the efficiency of large out-of-order processors is to run more than one thread on each processor with multithreading, for example simultaneous multithreading (SMT).
The CMP and SMT approaches to increased efficiency appear to be two extremes of a viable design spectrum. Most CMP approaches use relatively simple processors, which have higher inherent efficiency. SMT processors are usually larger and more complex, resulting in lower single-threaded efficiency, but share almost all processor resources between threads to increase efficiency.
Between these two extremes, it is possible to have a range of processors sharing varying degrees of hardware between threads. At the end of the range closest to CMPs, pairs of modestly more complex processors could be designed to share a few common components. At the end of the range closest to SMTs, processors could be designed that possess private copies of particular critical resources.
In the past, some systems shared second-level caches, branch predictors, and divide/square root hardware, but sharing these functional units incurred significant performance loss, which resulted in complications in the sharing scheduling.