1. Technical Field
The present invention generally relates to data processors and in particular to improving instruction throughput for processor execution frequency.
2. Description of the Related Art
Instruction execution throughput is an important measure of processor efficiency. This throughput directly correlates to the frequency at which the central processing unit (CPU) is able to process the instructions being executed thereon. Conventional CPU cores are typically designed to run at a high frequency, but are limited in actual execution frequency by critical subunits, which dictate the execution frequency. That is, the CPU cores execute instructions at the highest frequency supported by the critical subunits, which frequency is typically lower than the highest design frequency of the processor. These subunits comprise execution stages of the processor pipeline that execute particular types of operations, such as multiply operations, which are frequency-limiting operations. The subunits limit the maximum frequency operation of the CPU because execution of the particular-type operations cannot be completed at the higher processor frequency. In some processor designs, attempts at such higher frequency execution with the particular-type operations result in errors and/or stalls in the execution path, effectively reducing the processor throughput.
The frequency limiting operations (such as a sequence of multiple instructions) occur only very infrequently in the instruction execution stream, but force the processor's frequency and throughput to the lower limits, particularly when these instructions do occur within the instruction stream. For example, a multiply instruction may take three cycles to complete and has a limiting effect on the frequency to provide only 80% of throughput. To accommodate these multiply instructions, the entire processing sequence for all instructions is run at 80%, limiting the processor operations to 80% throughput at all times. As an example, a multiply operation in the execution pipe is limited (based on current design) to 800 MHz. With the frequency of the processor being 1000 MHz, the multiply operation becomes a limiting factor to high frequency execution.
Certain enhancements have been implemented, or proposed, to address the frequency limitations introduced by these subunits. For example, in one design, additional stages are introduced within the execution pipe. Adding more stages to the multiply sub unit is one way to increase the frequency but the addition of stages degrades the latency and increases the area. In another design, a certain amount of parallelism is provided, and additional transistors are introduced to cause the frequency limiting elements to be processed faster. However, both of these proposals involve substantially more hardware on the processor die, which results in larger area requirement, greater power consumption, and an associated increase in costs.
Such proposals lead to contrary design options from the designs desired for high density System on Chip (SoC). In SoC designs today, there is a growing focus on reducing area on chip and creating power efficient designs. The latest methods of Voltage Islands, Adaptive Voltage Controls, Software Voltage Controls, Adaptive Frequency Controls, etc. are all focused on power efficiency and/or efforts to lower Application Specific Integrated Circuit (ASIC)/SoC power while maintaining the highest levels of performance.
The PPC4xx CPU core is one of the leading CPU cores in the industry for performance/power capabilities in the 32-bit general purpose microprocessor arena. With the technology advent into 90 nm, 65 nm, and 45 nm, ASIC power density becomes one of the most critical design hurdles. Since CPU cores are the main functional part of the ASIC and are designed to run faster than any other functional components of the ASIC, the CPU/microprocessor core is the main focus in improving power efficiency and performance of ASICs. Within the CPU core there are numerous functional building blocks, each with their own power/performance attributes. It is thus not uncommon that a small set of units or sub-units within the core have operating constraints which limit the performance attributes of these units. These units tend to be those units within the execution stages that process the frequency limiting operations. Thus, as described above, these units may either dictate the overall performance (i.e., throughput) of the entire CPU or may be designed with additional components to achieve the desired performance goal at the sacrifice of power efficiency.