Processors designed to be embedded in system on a chip type products are not only required to have high performance but to provide this performance with low power consumption and small area. A low power design is achieved through architecture, micro-architecture design, implementation physical design, process, and software control innovations. A low power processor architecture defines the instruction set and programming model which facilitates low power. The micro-architecture design represents the data flow paths, control logic, and state machine design of the defined processor architecture. For a low power architecture, the micro-architecture design takes advantage of the instruction set architecture to reduce power in the implementation of the core design.
The general power equation is power=CV2f where C is the capacitance, V is the power supply voltage, and f is the frequency of change of the signals. More specifically, the power consumption of an embedded core processor can be split into three major components: power=Plogic+PRAM+PI/O. The Plogic portion is the power utilization attributable to the logic, the PRAM portion is the power attributable to the embedded RAM, and the PI/O portion is the power attributable to external pin changes directly attributable to the embedded core processor. Examples of PI/O power are data movement on and off the core and paging in new program segments. Minimizing data and program code movement, reducing capacitance by minimizing path lengths through good floor planning, minimizing the amount of required embedded RAM, and reducing the number of register file and embedded RAM accesses all would reduce power consumptions in an embedded core processor.
Reducing embedded RAM at the expense of expanding external RAM, however, is not necessarily a good tradeoff. Reducing embedded RAM due to more efficient program use of the on chip resources is a good tradeoff. Appropriate control or management of other functions such as clock gating is also important to minimize power. When functions are not used during different time periods, gating the clock off to the unused logic for those periods reduces the switching of signals thereby reducing power. Reducing path lengths through good floor planning reduces capacitance thereby also reducing power. Various prior art processor implementation processes provide technologies that run at low voltage. Such low voltage operation has a big impact on power by reducing the V2 component of the power equation. The implementation process also will have a characteristic capacitance that all signals see and that has a direct effect on power use. Finally, the software controls how the hardware is used and can therefore have an effect on the power utilized to accomplish some task. Each of these areas contributes to the overall power utilization of the final processor design and each area must be designed to obtain the lowest power.
Low power approaches many times can conflict with high performance requirements. This conflict occurs typically because the primary approach to achieving high performance is through high clock rates. For example, the use of complex high path length instructions, which minimize register file, instruction, and data memory accesses and which also significantly improve the efficiency of processing an algorithm and consequently can lower power use, would not typically be used in a high clock rate designed processor. If the complex instruction was to be implemented in a high clock rate design, then the complex function would be broken up into multiple pipeline stages which directly affects the complexity of the design and of the programming model. Thus, the increased complexity hardware and less efficient programming utilization can mask out any power improvements obtained from the higher clock rates.
Consequently, another approach to achieve high performance is needed. In the ManArray processor, high performance is achieved through parallelism and the use of highly efficient instructions rather than through high clock rates. This approach allows the full benefit of lowering the voltage in new processes to be achieved. By requiring short signal lengths and low power memories, the ManArray processor can achieve both high performance and low power. Even so, all the five areas (architecture, micro-architecture design, implementation physical design, process and software) for lowering the power need to be addressed in order to maximize the battery life in portable products containing a ManArray processor. The ManArray architecture and micro-architecture provide novel features that are scalable and can lower power utilization in each member of the scalable array family of cores as will be described.
The sequential model of instruction execution is used in the advanced indirect very long instruction word (iVLIW) scalable ManArray processor even though multiple PEs operate in parallel each executing up to five packed data instructions. The ManArray family of core processors provides multiple cores 1×1, 1×2, 2×2, 2×4, 4×4, and so on that provide different performance characteristics depending upon the number of and type of processor elements (PE) used in the cores.
Each PE typically contains its own register file and local PE memory, resulting in a distributed memory and distributed register file programming model. Each PE, if not masked off, executes instructions in synchronism and in a sequential flow as dictated by the instruction sequence fetched by a sequence processor (SP) array controller. The SP controls the fetching of the instructions that are sent to all the PEs. The ManArray architecture in one exemplary implementation uses multiple forms of selectable parallelism including, iVLIW with up to 5 instructions issued in parallel, packed data operations with up to 8 byte operations per instruction per cycle, and array PE parallelism with up to 16 PEs each capable of 5 instructions*8 byte operations=40 operations per PE for a total of 640 operations per 4×4 array per cycle. Since the parallel operations are selectable and since many algorithms use varying degrees of parallelism in their coding, the control of the processor array for low power operation is highly advantageous.
Thus, it is recognized that it will be highly advantageous to have architecture and micro-architecture low power features provided in a scalable processor family of embedded cores based on a single architecture model that uses common tools to support software configurable processor designs optimized for performance, power, and price across multiple types of applications using standard application specific integral circuit (ASIC) processes as discussed further below.