Digital computers are being used today to perform a wide variety of tasks. Many different areas of business, industry, government, education, entertainment, and most recently, the home, are tapping into the enormous and rapidly growing list of applications developed for today's increasingly powerful computer devices. Computers and other types of “smart” devices have also become a key technology for communicating ideas, data, and trends between and among business professionals. Additionally, digital computers, or more particularly, digital central processor units (CPUs) are increasingly being embedded in a variety of devices that are not traditionally associated with information technology. Examples include microcontrollers for machine tools, mechanisms, engines, and the like. The power and flexibility of the CPUs makes them well-suited for incorporation into a large number of different types of devices.
As embedded computer systems become increasingly ubiquitous and widespread in their use, there is increasing interest in improving the performance and software execution speed of the computer systems. One of the methods used by designers to increase software execution speed is to increase the processor “clock speed.” Clock speed refers to the rate at which the CPU steps its way through the individual software instructions. Increasing the number of clock cycles per second directly increases the number of instructions executed per second.
Another method used by designers is to increase the density of the electrical components within integrated circuit dies. For example, many high-performance integrated circuit processors include tens of millions of transistors integrated into a single die (e.g., 60 million transistors or more). As density increases, the clock speeds possible within a given design also increase, for example, as circuit traces are packed ever more closely together.
Another method for increasing performance is to increase the efficiency of heat removal from a high-density high-performance integrated circuit. As component density increases and clock speed increases, the thermal energy that must be dissipated per unit area of silicon also increases. To maintain high performance, stable operating temperature must maintained. Accordingly, the use of carefully designed heat dissipation devices (e.g., heat sink fans, liquid cooling, heat spreaders, etc.) with high-performance processors has become relatively standardized.
There are limits to the extent to which each of the above methods of improving computer system performance can be reasonably implemented. For example, with respect to increasing clock speed, high clock speeds leads to excessively tight tolerances for wiring, chipsets, printed circuit board design, and the like in order to ensure reliable operation. Additionally, high clock speeds tend to increase power consumption of the CPU. With respect to increasing the density of the electrical components within the integrated circuit die, as more and more transistors and other circuit elements are able to be incorporated within an integrated circuit die, there is increasing pressure to incorporate other functions within the die which used to be separate discrete chips. Often, greater performance can be realized by incorporating additional amounts of memory, controller hardware, and the like, as opposed to incorporating circuit elements designed to purely increase the speed of the CPU. Accordingly, silicon area tends to be just as valuable with a high-density fabrication process as with a less advanced lower density fabrication process. With respect to heat removal, the use of carefully designed heat dissipation devices limits the packaging options available to a device designer. This is especially cumbersome in the case of embedded computer devices. As described above, high clock speeds tend to directly cause high heat dissipation requirements, thus, requiring expensive and more space consuming heat dissipation devices in order to ensure high-performance.
Because of these limitations, CPU designers also concentrate on designing the circuitry of the CPU such that instructions can execute as efficiently as possible. For example, with many microprocessor designs, one or more instructions are capable of being executed per clock cycle. RISC (reduced instruction set computing) CPUs are specifically designed to have instruction sets wherein the majority of the instructions are capable of being executed within a single clock cycle. Additionally, RISC CPUs are designed to be simple in comparison to more complex CPUs, such that they require less silicon area in their manufacture. Because of these advantages, many microcontroller type devices are often based on RISC type CPUs.
These processors are RISC based, but still require separate instruction fetch, decode, execute and write-back stages in order to implement many types of commonly used instructions. A separate clock cycle is required for each of these stages. Thus, for example, read-modify-write type instructions require a minimum of four clock cycles to complete. In order to perform instructions faster than four clock cycles each, complex pipelining and microcode are required. This leads to a very complex processor design which consumes a large amount of silicon area to implement on chip. Although pipelining improves instruction throughput, the instruction latency is still four clock cycles. The complex pipelining also imposes additional penalties with respect to instruction branching. In the event of a branch, all the instructions in the pipeline need to be flushed and the pipeline needs to be refilled with new instructions, thereby imposing a significant performance penalty.
Thus, a need exists for a RISC type processor solution that can execute many different types of instructions within a single clock cycle. A need exists for a solution that increases processor performance without relying solely upon increased clock speed, component density, or heat dissipation.