1. Field
Advancements in microprocessor architecture are needed to provide improvements in performance, efficiency, cost, power tradeoffs, and utility of use.
2. Related Art
Unless expressly identified as being publicly or well known, mention herein of techniques and concepts, including for context, definitions, or comparison purposes, should not be construed as an admission that such techniques and concepts are previously publicly known or otherwise part of the prior art. All references cited herein (if any), including patents, patent applications, and publications, are hereby incorporated by reference in their entireties, whether specifically incorporated or not, for all purposes. Nothing herein is to be construed as an admission that any of the references are pertinent prior art, nor does it constitute any admission as to the contents or date of actual publication of these documents.
Semiconductor fabrication technology has undergone continuous improvement for four decades at a rate defined by Moore's Law: The number of transistors integrated in a single circuit has grown by a factor of four every three years. Such continuous improvement has allowed the number of transistors available for a single-chip microprocessor to increase from several thousand transistors for the first microprocessor in 1971 to more than one billion transistors currently.
As technology advanced through the early 1980's, integration levels of about 100,000 transistors per chip made it possible to develop a complete processor partitioned into a handful of components, such as a 32-bit integer pipeline, a 64-bit FP pipeline, a memory-management unit, and cache memory. Around 1990, technology permitted integration of about one million transistors, enabling a complete processor to be fabricated in a single chip. As the technology continued to advance through about 1995, microprocessors began to replicate multiple functional units to execute instructions in parallel, a type of computer architecture called “superscalar” that exploits ILP. Further integration through about 2000 led to microprocessors that executed multiple threads on a common set of functional units, a type of architecture called SMT that exploits TLP. Presently, multiple complete processors are being integrated on a single chip, a type of architecture known as CMP.
As the level of integration reaches one billion transistors and beyond, several limitations of technology and applications have grown increasingly important. One technology limitation is in the amount of power that can be supplied and dissipated economically by a single chip. A related limitation is the amount of energy that can be stored in batteries or other power sources for portable applications. In addition to dynamic power, which is dissipated as devices switch, static power caused by leakage current becomes more important as transistor size is reduced. Another technology limitation is that interconnection characteristics do not scale with transistor size and speed, so interconnection of transistors and logic blocks becomes increasingly important in determining overall chip size, power, and performance. Technology further limits chip yield and reliability as device sizes shrink and more devices are integrated. The characteristics of software applications impose additional limitations in the amount of ILP and TLP that can be exploited by individual processors, as well as further limitations that arise when memory used by an application is partitioned across multiple processors.
A number of techniques have been practiced and proposed to address such limitations. For an individual processor, clocks may be gated off to idle logic, and voltage may be lowered to reduce power consumption. Such voltage-scaling techniques also reduce operating frequency and consequently performance. Similar techniques may be applied to complete processors for CMP applications. For example, complete processors may have clocks gated off or the power supply may be disconnected. In practice, such techniques provide coarse control over the tradeoffs between power and performance.
Contemporary techniques provide additional flexibility with asymmetric processors in CMP or multi-chip configurations. For example, an integrated circuit may integrate one large, high-performance processor and a second smaller, lower-power and lower-performance processor. Either or both complete processors may be active depending on application performance requirements, power limitations, and battery life. The processors may also be implemented in separate voltage domains, so that voltage-scaling techniques may be applied separately to individual processors. In addition, techniques have been proposed to provide asymmetric functional units within a processor. For example, a processor may integrate two integer ALUs with differing power and performance. The processor may be configured to execute instructions using either or both ALUs.
Currently practiced and proposed techniques still suffer a number of disadvantages. For example, when asymmetric processors are employed, the resources of an inactive processor are unavailable and wasted. In addition, the resources of an active processor may be underutilized; for example, the FPU of a high-performance processor is underutilized when the processor is allocated to an application that executes FP instructions infrequently or not at all. When asymmetric functional units are employed, the additional unused resources result in unnecessary interconnection, leading to lower performance and additional power dissipation. When voltage is scaled, the processor needs to remain idle for considerable time until clocks are stable at a new frequency appropriate for the scaled voltage.
Throughout the evolution of microprocessor technology it has also been necessary to make tradeoffs between fixed-function and programmable components. Fixed-function components may be optimized for manufacturing cost and performance, but their market volume is limited by demands of their target application(s). Programmable components may have higher manufacturing cost and lower performance for a specific application, but the programmable capability enables much higher volumes covering a range of applications. In addition, programmable components may be more economical because development costs are spread over a larger manufacturing volume. Similarly, systems developed with programmable components may be brought to market more quickly than those developed with fixed-function components because the time to complete and correct a program is generally less than the time to develop a new fixed-function circuit.
As a consequence of the tradeoffs between fixed-function and programmable elements, applications are commonly transitioned from fixed-function solutions to programmable solutions as circuit costs decrease and performance increases over time. For example, the first microprocessor was introduced in 1971 by Intel as an alternate solution to developing a calculator with fixed-function circuits. Similarly, gate arrays have been applied to customize functions either by programming metal interconnections between logic blocks during the last manufacturing steps or else after the circuit is manufactured by blowing fuse connections. Further advances have enabled functions to be customized by loading the contents of SRAM cells associated with programmable logic gates and interconnection, a technology called Field-Programmable Gate Arrays (FPGAs). FPGAs enable functions to be programmed multiple times after manufacturing, a characteristic often described as reconfigurable.
Microprocessors occupy a unique position on the spectrum of fixed-function and programmable components: microprocessors may be programmed, but they commonly implement a fixed-function, namely executing instructions of a predefined architecture. For example, x86 processors, used extensively in IBM-compatible computer systems, generally execute a fixed set of instructions, but are programmed for a wide variety of applications.
Interest has grown in making microprocessors more configurable; that is, in supplementing a microprocessor that executes a standard instruction set with fixed functions that have been optimized for a specific application. What are needed to address the challenges in microprocessor resource management and reconfiguration are improved microprocessor architectures to provide enhanced and superior power efficiency as well as performance characteristics and tradeoffs thereof.