1. Field of Invention
The present invention relates generally to the field of processing devices, and in particular to an energy efficient processing device and even more particularly to microcoded processing devices.
2. Description of Related Art
Prior art processing devices may include microprocessors, microcontrollers or digital signal processors. In computer engineering, microarchitecture is the design and layout of a microprocessor, microcontroller, or digital signal processor.
Microarchitecture considerations include overall block design, such as the number of execution units, the type of execution units (e.g. floating point, integer, branch prediction), the nature of the pipelining, cache memory design, and peripheral support.
Microcode is the underlying programming methodology for microprocessors such as the AAMP family of proprietary microprocessors from Rockwell Collins Inc. The term microcode is hereby defined to be: “a level of programming language for microprocessors such that at each line of microcode every internal data path and logical operation is available, without the need for further decoding”; i.e., a line of microcode is comprised of a plurality of micro-orders; where each micro-order controls a fundamental data path or operation internal to the microprocessor. Depending on the complexity of the microarchitecture, a typical line of microcode may contain 5 to 15 micro-orders.
A computer operation is an operation specified by an instruction stored in a computer's memory. A control unit in a typical computer uses the instruction (e.g. operation code, or opcode), and decodes the opcode and other bits in the instruction to perform required micro-operations. Decoding of the opcode increases the time required to execute an instruction. The control store is a memory that contains the CPU's microprogram, and is accessed by a microsequencer. Microoperations are implemented by hardware, often involving combinational circuits. In a CPU, a control unit is said to be hardwired when the control logic expressions are directly implemented with logic gates or in a PLA (programmable logic array). In contrast to this hardware approach for the control logic expressions, a more flexible software approach may be employed with a microprogrammed control unit, where the control signals to be generated at a given time step are stored together in a control word called a microinstruction. The collection of these microinstructions or micro-orders is the microprogram or microcode, which is stored in a memory element termed the control store.
Thus the outputs of the control unit direct the CPU operations, and a control unit can be thought of as a finite state machine. Words of the microprogram are selected by a microsequencer and the bits from those words directly control the different parts of the device, including the registers, arithmetic and logic units, instruction registers, buses, and off-chip input/output. In modern computers, each of these subsystems may itself have its own subsidiary controller, with the control unit acting as a supervisor.
All types of control units generate electronic control signals that control other parts of a CPU. Control units are usually one of these two types: microcoded control units or hardware control units. In a microcoded control unit, a program reads values from memory, and generates control signals. The program itself is executed by a digital circuit called a microsequencer. In a hardware control unit, a digital circuit generates the control signals directly from combinational logic.
Hence microprogramming is a systematic technique for implementing the control unit of a computer via a microcoded control unit. Microprogramming is a form of stored-program logic that substitutes for sequential-logic control circuitry. A microinstruction is an instruction that controls data flow and instruction-execution sequencing in a processor at a more fundamental level than machine instructions; thus, a series of microinstructions is commonly necessary to perform a particular computing function.
A central processing unit (CPU) in a computer system is generally comprised of a data path unit and a control unit, with the control unit directing the data path unit. The data path unit or datapath includes registers, function units such as ALUs (arithmetic logic units), shifters, interface units for main memory and I/O, random access memory (RAM), including scratchpad RAM, and internal buses. Scratchpad RAM is typically a local dedicated memory cache reserved for direct and private usage by the CPU. A cache is typically used in the prior art to temporarily store copies of data that reside in slower main memory.
The control unit controls the steps taken by the data path unit during the execution of a machine instruction, microinstruction or macroinstruction (e.g., load, add, store, conditional branch) by the datapath. Each step in the execution of a macroinstruction is a transfer of information within the data path, possibly including the transformation of data, address, or instruction bits by the function units. The transfer is often a register transfer and is accomplished by sending a copy of (i.e. gating out) register contents onto internal processor buses, selecting the operation of ALUs, shifters, and the like, and receiving (i.e., gating in) new values for registers. Control signals consist of enabling signals to gates that control sending or receiving of data at the registers, termed control points, and operation selection signals. The control signals identify the micro-operations required for each register transfer and are supplied by the control unit. A complete macroinstruction is executed by generating an appropriately timed sequence of groups of control signals.
A complex instruction set computer (CISC) is a microprocessor instruction set architecture (ISA) in which each instruction can execute several low-level operations, such as a load from memory, an arithmetic operation, and a memory store, all in a single instruction, and from within the CPU. The ISA specifies the instructions, their binary formats, the complete effect of each operation in a CPU, the visible registers of the machine, and any other aspects of the system that affect how it is programmed. The term CISC was coined to contrast to the ISA for a reduced instruction set computer (RISC). Before RISC processors were designed, many computer architects designed instruction sets to support high-level programming languages by providing “high-level” instructions such as procedure call and return, loop instructions such as “decrement and branch if non-zero” and complex addressing modes, to allow data structure and array accesses to be combined into a single instruction. An example of a CISC CPU is the Intel iAPX 432 microprocessor architecture, which supported object-oriented programming in hardware, even providing for automatic garbage collection for deallocated objects in memory. The iAPX architecture was so complex for its time that it had to fit on multiple chips. Another example is the Intel x86 microprocessor, used in personal computers. Further, Directly Executable High Level Language (Directly Executable HLL) design CPUs can take a high level language and directly execute it by microcode, without compilation. The IBM Future Systems project and Data General Fountainhead Processor are examples of Directly Executable HLL design.
The compact nature of the CISC ISA results in smaller program sizes and fewer calls to main memory. A control store (fast memory within the CPU) is often prominent in a CISC design, and a CISC CPU will lack the decoding logic stage found in a RISC CPU.
A Very Long Instruction Word (VLIW) architecture refers to a CPU architectural approach to take advantage of instruction level parallelism. In Very Long Instruction Word CPUs, many statically scheduled, tightly coupled, fine-grained operations execute in parallel within a single instruction stream. A processor that executes different sub-steps of sequential instructions simultaneously (pipelining), that employs parallel (superscalar) execution and that executes instructions out of order (branch prediction) can achieve significant performance improvements, at the cost of increased hardware complexity and power consumption. The VLIW approach offers benefits similar to these techniques but employs a compiler to determine which operations may be executed in parallel, and which branch is most likely to be executed, during compiling of a computer program. VLIW architectures therefore may offer improved computational power with less hardware, at the cost of greater compiler complexity. One VLIW instruction may perform multiple operations; with one instruction operation for each execution unit of the device. For example, if a VLIW CPU has three execution units, then a VLIW instruction for that chip would have three operation fields, each field specifying what operation should be done on that corresponding execution unit. To accommodate these operation fields, VLIW instructions are usually at least 64 bits in width, and on some architectures wider.
As stated, a reduced instruction set computer, or RISC, is a microprocessor instruction set architecture (ISA) that favors a simpler set of instructions. The idea was originally inspired by the discovery that many available instructions in CISC CPU architectures were seldom used by the programs that were running on them. Also these more complex features took several processor cycles to be performed. Additionally, the performance gap between the processor and main memory was increasing. This led to a number of techniques to streamline processing within the CPU, while at the same time attempting to reduce the total number of memory accesses. A RISC microprocessor utilizes and emphasizes a decoding logic stage rather than emphasizing a control store, as in a CISC chip. In addition, the term “Load-Store” is often used to describe RISC processors. Instead of the CPU handling many addressing modes, load-store architecture uses a separate unit dedicated to handling very simple forms of load and store operations, and only register-to-register operations are allowed. By contrast, CISC processors are termed “register-memory” or “memory-memory”. Thus RISC compilers keep operands in registers (the operand being the part of a machine instruction that references data or a peripheral device; in the instruction, ADD A to B, A and B are the operands, and ADD is the operation code), in order to employ register-to-register instructions. CISC compilers use an ideal addressing mode and the shortest instruction format to add operands in memory, and make repeated memory accesses in a calculation. RISC compilers however, prefer to use LOAD and STORE instructions to access memory so that operands are not implicitly discarded after being fetched, as in CISC memory-to-memory architecture.
Notwithstanding the above, differences between RISC and CISC processors have blurred over time. A program execution time is equal to the total number of executed instruction times the cycles per instruction and the time per cycle. In modern CPU ISAs, many instructions, no matter how rarely-used, are often included if the cycle-time can be made small and the hardware and/or control store exists for implementing the instructions. Thus the number of instructions is not reduced in modern CPUs; only the cycle time is reduced. Further, even more designs with fanciful acronyms have appeared, including NISC (No Instruction Set Computer) and WISC (Writable Instruction Set Computer) for embedded processors that have rewritable microcode. There are ideas for processors that are reconfigurable, even reconfigurable during runtime, in FPGA logic. An example of a recent processor that provides direct support for the Java language, in hardware, is U.S. Pat. No. 6,317,872, incorporated herein by its entirety. An example of the use of microcode as the executable language on an embedded processor is in public use from the assignee of the present invention in a GPS product placed in public use around 1997.
However, ultimately a microprocessor ISA is useful only if it enables one to achieve suitable performance for the task at hand. What is not found in the prior art is a system and method to process data with much reduced power consumption and heat generation. The present invention addresses these concerns.