The pipeline architecture has been used in the designs of many of today's computers. The architecture resembles an assembly line. It partitions the execution sequence of instructions into a sequence of tasks (e.g. fetching instructions, decoding instructions, execution, storing results). To each of these tasks is provided a dedicated station of resources. As instructions flow through the pipeline, their tasks will be serviced by the stations successively. Each instruction is followed by its next sequential instruction which will occupy, as soon as possible, the stations which it vacates. The time delay between the initiation of different instructions and the completion thereof under the pipeline architecture is therefore compacted, and throughput of the computer is increased.
An inefficient station in a pipeline computer would create a bottleneck. A bottleneck station dictates the throughput of the computer because it dictates the flow speed of instructions. If a bottleneck station can be accelerated, throughput of a pipeline computer will be increased.
A common bottleneck in pipeline computers is the decoding of instructions with opcodes of non-uniform lengths.
Computer instructions normally have an opcode from which signals for directing the processing of a corresponding instruction are generated. The size of the opcode for a computer usually depends on the width of its data path, which in turn depends on its hardware (arithmetic-logic unit, buses, decoder, etc.). If the opcode is n bit long, it can be decoded into 2.sup.n different bit combinations, and the computer would have a set of 2.sup.n different types of instructions. Typically in most of today's computer designs, n is an integer multiple of eight (i.e. a byte).
There are occasions, however, where the instruction set of a computer needs to be expanded without a corresponding expansion of its data path. One such occasion arises when a computer must be upgraded to provide more instructions without having to make substantial changes to its hardware. When such occasions arise, one or more bytes would commonly be added to the opcode.
In prior art computers, decoding of a multi-byte opcode is performed by examining each byte one at a time when the instruction is decoded. The examination of each byte would take one cycle. A disadvantage in this prior art approach is that decoding a multi-byte opcode would now take multiple cycles, creating a bottleneck in the pipeline and decreasing the throughput.