Microprocessor designers have increasingly endeavored to exploit parallelism to improve performance. One parallel architecture that has found application in some modern microprocessors, including digital signal processors, is the very long instruction word, or VLIW, architecture. VLIW architecture microprocessors are called that because they handle VLIW format instructions.
A VLIW format instruction is a long fixed-width instruction that encodes multiple concurrent operations. VLIW systems use multiple independent functional units. Instead of issuing multiple independent instructions to the units, a VLIW system combines the multiple operations into one very long instruction. In a VLIW system, computer instructions for multiple integer operations, floating point operations, and memory references may be combined in a single, wide, VLIW instruction.
Thus, a VLIW machine consists of multiple independent functional units controlled on a cycle-by-cycle basis by a VLIW format instruction (62 or more bits). All of the functional units can be arbitrarily pipelined, i-e., they can start a new operation every cycle and take a fixed number of cycles to complete an operation, although the number of cycles for completion can vary from one functional unit to another. The pipeline stages of all units operate in lock step, controlled by a single global clock. The VLIW instruction is the concatenation of a plurality of operation subfields, one for each functional unit to be controlled.
All functional units are connected to a shared multiport register file from which they take their operands and into which they write their results. Any previously computed result can therefore be used as the operand for any functional unit. A VLIW instruction is loaded every cycle. Each functional unit is controlled during that cycle by its own operation subfield which identifies the source and the destination locations in the multi-port register file, and the operation to be started. A typical architecture includes a plurality of arithmetic and logic units, a plurality of memory interface units and a branching control unit. All three of these types of functional units are typically pipelined to maximize the speed of operation.
While VLIW microprocessors are capable of providing greatly improved performance, as compared with other, less parallelized microprocessors, it is desirable to improve the performance of such devices. One problem area for performance arises from the types of instructions that are packed together to form a VLIW instruction. The theory of VLIW operation is that the compiler selects operations that can be executed in parallel to combine into a single VLIW instruction. Therefore, dependent instructions must be executed in sequential cycles. This can result in inefficiencies in the utilization of the microprocessor.
The present invention overcomes such inefficiencies.