1. Field
The present application relates to computer processors.
2. Related Art
A computer processor (or central processing unit or CPU) executes a sequence of instructions, typically obtained from main memory, which are executed in positional order except when redirected by a branch, jump, call, or similar control-flow operation (hereinafter “control-flow operation”). The order is important because there are often semantic dependencies among instructions in a sequence, and the machine state would be different if the instructions were executed in a different order. However, some sequences of instructions do not have to be issued in strict order. An important class of CPU so-called “wide issue” architectures can issue more than one instruction simultaneously.
Multi-threading, a common approach to parallel execution, specifies the program not as a single sequential stream of instructions, but as several such streams. Each stream may be executed by its own sub-CPU or pipeline, or the streams may be interleaved on a single CPU such that each uses resources left idle by the other streams. Sequential semantics are enforced within any single stream of instructions, but the streams themselves are considered to be independent, meaning that the execution order of instructions in one stream vs instructions in another stream doesn't matter except for certain specialized instructions that serve to synchronize the streams.
In another approach, typified by Very Long Instruction Word (VLIW) architectures, there is only one instruction stream, but each instruction may contain not just one, but several “operations”, and all of these operations are executed simultaneously. The several operations within a single instruction are synchronized at every instruction issue cycle and thus advance in lock step. Thus, a given operation executed in a given instruction may be semantically dependent on any operation executed earlier, and operations that are executed in later instructions may be semantically dependent on the given operation, but operations within the same instruction cannot be dependent on each other. Compilers and other code generation software analyze the program and “schedule” individual operations into a sequence of instructions so as to maximize “instruction-level parallelism” (ILP), in other words, to maximize the number of operations per instruction. This maximization of ILP maximizes performance.
In existing art, there are CPUs that support multiple instruction streams for a single thread of execution. In these CPUs the instructions of the various streams are interleaved in memory. In some designs, a single instruction looks much like an instruction for a single-stream machine, and instructions for each stream occupy every Nth instruction in memory. In other schemes a group of sub-instructions to be executed in a single cycle are concatenated into a single instruction, which is then fetched as a unit. This approach can yield smaller programs because the instruction encoding can have a compact representation of idle streams and often can merge common information from several sub-instructions into a single shared representation.
Branches and other control-flow operations occur frequently in programs, control-flow target addresses are large, and many programs assume that a code pointer is the same size as a data pointer. Multiple instruction streams present problems with control-flow operations. In a single-stream machine, a control-flow operation contains or computes a single code address which is to be the start of subsequent execution. If there are multiple streams then each stream needs a target address to branch to. Requiring branches and other control-flow operations to have multiple targets (one for each stream) makes it impossible to express a control flow target in a simple address pointer of normal size. However, if the streams are interleaved then control-flow operations require only a single address, namely the point at which the interleaved streams start. Likewise, in a VLIW-architecture, the instruction can branch to a target instruction that necessarily redirects all the operation streams of the instruction as well.
Unfortunately, sequential interleave has problems too. Instructions are represented as bit patterns encoding the intended operation, arguments, and options and so on. Variable-length bit pattern encoding for an instruction (referred to herein as a “variable-length instruction”) can be used to reduce the size of an instruction or to fit as much information within an instruction size (such as 32 bits) dictated by other CPU design considerations. However, variable-length instructions can be difficult to parse, and constraints with respect to power, circuit area, and timing can result in practical limitations on the number of variable-length instructions that can be decoded in a machine cycle. Variable-length instructions are used in x86 instruction set architectures. Fixed-length bit pattern encodings for an instruction (referred to herein as a “fixed-length instruction”) have no parallelism constraints but are wasteful of bits and are quicker to thrash in the cache system, thus limiting the effectiveness of the cache system. Fixed-length instructions are used in the Intel® Itanium® Architecture and in RISC instruction set architectures.