1. Field of the Invention
The present invention relates generally to processing systems, and more particularly to long instruction word processing with instruction extensions.
2. Description of the Prior Art
Computer processors can generally be sorted into two classes: general-purpose processors that can be adapted to a multitude of applications and application-specific processors that are optimized to serve specific applications. General-purpose processors are designed to run a general instruction set, namely a set of instructions that the processor will recognize and execute. Such general instruction sets tend to include a large number of instructions in order to support a wide variety of programs.
Application-specific processors are designed to run a more limited instruction set, where the instructions are more tailored or specific to a particular application. While an application-specific processor can enable certain programs to execute much faster than when run on a general-purpose processor, they are more limited in functionality due to the limited instruction sets they run.
One technique to increase speed of instruction execution is to bundle instructions within a Long Instruction Word (LIW) instruction to allow for parallel processing of the operations. LIW instructions comprise two or more operations. An operation is an individual instruction. LIW instructions are limited to the operations that are native to the processor. Some examples of LIW instructions are very long instruction word instructions and ultra long instruction word instructions.
Another technique for improving speed of instruction execution is pipelining in which multiple individual instructions overlap in execution. The processor will generally process individual instructions through several consecutive stages. When individual instructions are pipelined, the stages of one individual instruction may overlap with another. The individual instruction data marches through the pipeline until it reaches the end of the pipeline. The length of the pipeline is determined by the maximum cycle at which a write can occur. As the number of stages increase, the pipeline also increases.
A stage is a step to process an individual instruction. For example, within a five-stage instruction processor, each individual instruction is first issued (stage 1), the registers of the individual instruction are read (stage 2), an operation is executed or an address calculated (stage 3), an operand is accessed from a first register (stage 4), and the result is written into a second register (stage 5.) Generally, the processor may process each stage during one clock cycle. Depending on the architecture of the system, there may be any number of stages.
One problem with pipelining is that pipelining requires numerous hardware components. Individual instructions frequently depend upon values within registers produced by other individual instructions. If a second individual instruction depends upon a value produced by a first individual instruction, the second individual instruction may have to stall one or more clock cycles until the needed value is written to the correct register. The process of stalling an individual instruction in the prior art requires several hardware components for every processing stage. As the length of the pipeline increases, the amount of area dedicated to the pipeline grows. As a result, the pipeline can greatly dwarf the size of a register file.
Further, not all individual instructions are committed upon issuance. In some instances, an individual instruction is not committed until a later processing stage. As a result, the pipeline holds the individual instruction for several processing stages until the individual instruction is committed. This process requires additional hardware components for every stage.
FIG. 1 is an illustration of a pipeline implementation with a five-stage instruction processor in the prior art. In this illustration, three individual instructions 110, 115, and 120 are pipelined.
Each stage may be processed within one clock cycle 101-108. The processor may process each stage once within a single clock cycle. In one example not depicted, there may be a different individual instruction in each stage. Each stage will be processed once within the single clock cycle. Subsequently, each individual instruction proceeds to the next stage.
In the first clock cycle 101, the individual instruction 110 is issued (“ISSUE”) 125a in stage one. In a second clock cycle 102, the registers of the first individual instruction 110 are read (“READ”) 130a and the second individual instruction 115 is issued 130b. In the third clock cycle 103, the first individual instruction 110 is executed (“EXE”) 135a but the second individual instruction 115 and the third individual instruction 120 are stalled.
The second individual instruction 115 depends upon the value within register $A1 from the first individual instruction 110. The value of register $A1 will not be available until the first individual instruction 110 writes (“WRITE”) 145a the value to register $A1 in the fifth clock cycle 105. If the second individual instruction 115 executes 145b before the first individual instruction 110 writes the value to register $A1, the second individual instruction 115 will produce an erroneous result. Consequently, the second individual instruction 115 must stall within the pipeline. The second individual instruction 115 will stall during clock cycle 103 and then proceed to the next stage in clock cycle 104.
The third individual instruction 120 does not depend on either individual instruction 110 or individual instruction 115, however, each stage must be processed only once within a clock cycle. If the third individual instruction 120 is not stalled, then the third individual instruction 120 will attempt to proceed with the read 145c stage at the same time as the second individual instruction 115. Consequently, the third individual instruction 120 stalls for one clock cycle 103 and then proceeds to the next stage in clock cycle 104.
In a fourth clock cycle 104, the operand of the first individual instruction 110 is accessed from a register or the operand is accessed from another memory (“MEM”) 140a. The registers of the second individual instruction 115 are read 140b and the third individual instruction 120 issues 140c. 
In the fifth clock cycle 105, the register $A1 of the first individual instruction 110 is written to (“WRITE”) 145a and the processing of the first individual instruction 110 is completed. Now that the $A1 register is written, the second individual instruction 115 may proceed so the second individual instruction 115 is executed 145b. The registers of the third individual instruction 120 are read 145c. Subsequent clock cycles 106, 107, and 108 proceed without stall since there are no more dependencies or stage conflicts between the individual instructions 110, 115, and 120.
FIG. 2 is an illustration of a four-stage pipeline 200 in the prior art. Register values enter flip flop 265 and multiplexers (“MUX”) 230, 235, and 240 through signal paths 205, 210, 215, and 220, respectively. Control signals 250, 255, and 260 control the multiplexers 230, 235, and 240, respectively. The register value may pass through each multiplexer 230, 235, and 240 to flip flops 270, 275, and 280, respectively.
A register value that is available within the first stage is sent through the signal path 205 into the flip flop 265. The register values that are available within the second, third, and fourth stages are sent through signal paths 210, 215, and 220, respectively. The register values sent through signal paths 210, 215, and 220, proceed to multiplexers 230, 235, and 240.
The register values may proceed from multiplexers 230, 235, and 240, to flip flops 270, 275, and 280, respectively. Multiplexer (“MUX”) 285 may access the register values from the flip flops 265, 270, 275, and 280 as well as the register values from the register file 287. Control signal 292 controls multiplexer 285.
Even if multiplexer 285 accesses a register value at flip flop 265, 270, 275, and 280, all register values will proceed to the register file 287. For example, if a register value is available in the first stage, the register value may be accessed by multiplexer 285 and still pass from the flip flop 265 through the multiplexers 230, 235, 240 and flip flops 270, 275, 280, until being sent to the register file 287. The register file 287 comprises one or more registers that may store register values, one or more read ports and one or more write ports. Any register values received by the multiplexer 285 are sent to the flip flop 290, which delivers the register value to a processor through signal path 295.
Although pipelining saves time by overlapping individual instructions, numerous hardware components are necessary to implement the pipeline. For example, the number of registers necessary to implement a pipeline may be calculated as follows: Number of registers=(number of stages)×(width of stage)+(staging registers). In one example, assuming 32 stages with 128-bits in length and 7072 staging registers for pipelining, 32×128+7072=11,170 registers. As a result, the number of hardware components, including flip flops, multiplexers, and registers may be cost prohibitive. Moreover, as the number of components increases, valuable space taken by the components within a chip or printed circuit board also increases.