In the field of microprocessors and other programmable logic devices, many improvements have been made in recent years which have resulted in significant performance improvements. One such improvement is the implementation of pipelined architectures. In a pipelined microprocessor, as is well known in the art, multiple sequential program instructions are processed simultaneously along various stages of execution from instruction fetch through execution and writeback. As a result, later instructions (in program order) are fetched prior to completion of the execution of earlier instructions in pipelined microprocessors. Because of pipelining, the effective rate at which instructions are executed by a microprocessor can approach one instruction per machine cycle in a single pipeline microprocessor, even though each individual instruction may require multiple machine cycles for processing from fetch through execution and writeback.
Another approach to increasing instruction throughput is the use of so-called superscalar microprocessor architecture. Superscalar microprocessors issue multiple instructions for execution in each machine cycle, and thus effectively have multiple pipelines operating in parallel, providing even higher theoretical performance levels over the scalar, pipelined microprocessor architecture.
As is well known in the art, microprocessors having pipelined architectures, especially if superscalar, are vulnerable to operating hazards that are commonly referred to as dependencies. In general, dependencies are situations in which a resource, such as a register or memory location, is required by multiple instructions that are at different stages in the pipeline. For example, a first instruction may be directed to write the results of its operation to a particular register while a second instruction, later in the program flow, may require the contents of that same register as an input operand to its instruction. If, in this example, the input operand to the second instruction is fetched prior to execution of the first instruction, a program error due to a register dependency is present. Other dependencies include similar conflicts relative to memory locations, and conflicts between instructions regarding microprocessor resources. An example of a resource conflict in a complex instruction set (CISC) microprocessor may involve the execution of a microcode sequence by multiple instructions in the pipeline, where the later instruction is precluded from accessing microcode from the microsequencer until completion of an earlier-issued microcode instruction. Other dependencies arising in pipelined microprocessors will, of course, be well known to those of ordinary skill in the art.
Each instance of a dependency involves some amount of handling, such as a pipeline stall, register renaming, and the like. Since the likelihood of a dependency increases with its depth (including both the number of stages in a pipeline and the number of effective pipelines), the performance degradation resulting from dependencies limits the pipeline depth in a microprocessor. In addition, the effect of exceptions such as mispredicted branches and other events causing a pipeline flush increases dramatically with the depth and parallelism of pipelines. Accordingly, the design of pipelined microprocessors generally involves a tradeoff between the performance improvement obtained with increasing pipeline depth and superscalar width, on one hand, and the deleterious effects of increased dependency and exception frequency and overhead, on the other hand.
By way of further background, U.S. Pat. No. 5,430,851 describes a microprocessor having multiple instruction streams, each having an instruction setup unit for fetching and decoding the instruction. A common scheduling unit is provided according to this architecture, receiving decoded instructions from each of the multiple instruction streams, and issuing or scheduling each decoded instruction to an appropriate one of the multiple execution units. This reference also discloses multiple scheduling units, each coupled to multiple instruction streams.
In each of the disclosed examples in the above-referenced U.S. Pat. No. 5,430,851, dependencies are checked and handled in each of the multiple instruction streams, prior to scheduling of the instructions. In this arrangement, no instruction is apparently issued to the scheduling units unless dependencies have been checked and, if any found, handled. This approach appears to be limited in its ability to efficiently handle exceptions, mispredicted branches, interrupts, and other events that cause pipeline flushes. In addition, none of the examples in the above-referenced U.S. Pat. No. 5,430,851 are shown as using microcoded instructions or sequences. As a result, the implementations illustrated in the above-referenced U.S. Pat. No. 5,430,851 are primarily useful in connection with RISC (Reduced Instruction Set Computer) architectures, where the complexity of each instruction is maintained at a minimum, with complex sequences handled in the compilation of the code, or by upstream circuitry.
By way of further background, multi-stream microprocessors of the so-called "barrel" type are also known in the art. According to this approach, instruction execution alternates among the streams at instruction boundaries. For example, a conventional two-stream barrel processor will process instructions from one stream on even cycles and from the other stream on odd cycles. This type of processing thus introduces limitations in the flexibility and performance of the microprocessor.