1. Field of Invention
This invention relates generally to processors and more specifically to high performance and low power processors.
2. Discussion of Related Art
Processors are well known and widely used in many applications. Because processors execute instructions that can be combined into an unlimited number of combinations and sequences, they can be programmed for almost any application. Even though such programmability makes processors very flexible, there are nonetheless many kinds of processors available.
High end processors are used in supercomputers and other computationally intensive applications. Some such processors employ vector architectures. A vector architecture allows the processor to fetch an instruction once and then execute multiple iterations of the instruction with different data in each iteration. In applications with relatively large vectorizable loops, a vector architecture reduces the total time and the energy required to execute a program because each instruction needs to be fetched fewer times per loop. A vector processor always includes a scalar processor to execute the parts of a program that are not vectorizable.
Some processors employ a multi-issue architecture. A multi-issue architecture contains multiple paths, each of which can execute an instruction. As the prosecutor executes a program, it groups instructions into “bundles,” and applies each instruction in the bundle to one of the paths so that the instructions of the bundle execute concurrently. Concurrent execution increases the rate at which a program executes.
Various approaches are used to form bundles. In statically scheduled multi-issue processors, a compiler groups instructions into bundles as part of generating a program for the processor. In dynamically scheduled processors, hardware within the processor groups instructions into bundles as the program executes. Regardless of how the bundles are formed, a mechanism is used to avoid conflicts that can occur when multiple instructions are executed concurrently. Conflicts could be created, for example, if multiple instructions in a bundle simultaneously need to access the same hardware resource in the processor or if one instruction in the bundle requires as an input a value that is output when another instruction in the bundle executes. For statically scheduled processors, the compiler recognizes potential conflicts and defines the bundles so that conflicting instructions do not appear in the same bundle. In a dynamically scheduled processor, the processor contains scheduling logic that groups instructions into bundles only if the instructions do not conflict.
Even relatively small electronic devices, such as hand held electronic devices, employ processors. Processors used in small electronic devices tend to have a statically scheduled scalar architecture, which could be a single-issue or multi-issue architecture. A processor with a scalar architecture fetches an instruction and data for the instruction each time the instruction is executed. In executing a loop that requires an instruction to be executed multiple times, a processor with a scalar architecture will fetch the instruction multiple times. Consequently, processors with scalar architectures tend to execute programs that include vectorizable loops more slowly and dissipate more energy doing so than those with vector architectures. However, they tend to occupy a smaller area on a silicon die, which can be a significant advantage in making a small or low cost processor for an embedded application.
Some scalar processors have been adapted to execute multiple operations for one fetch of an instruction. However, these processors have required that instructions encoding the multiple operations be encoded in one instruction word. Such architectures proved difficult in practice to use. The instruction set for the processor needed to be expanded to accommodate many new instructions encoding multiple operations. In addition, making a complier that could identify patterns of instructions in a program that could be mapped to an instruction encoding multiple operations proved difficult.
A related concept is called “software pipelining.” By overlapping the execution of successive iterations of a loop, the order of instructions processed is selected to reduce the total execution time of a block of code.
Some processors employ a “rotating register file.” A rotating register file provides a series of register locations that can be readily accessed by a processor. Successive reads or writes to the same address in the register file result in access of successive locations in the file. When the last location is reached, the succession “rotates” back to the first location. A rotating register file may be used during software pipelining to reduce code size.
Notwithstanding the many types of processors available, it would be desirable to provide an improved processor architecture.