1. Field of the Invention
The present invention is generally in the field of processors. In particular, the present invention is in the field of VLIW processors.
2. Background Art
VLIW (Very Long Instruction Word) processors use an approach to parallelism according to which several instructions are included in a very long instruction word or a “VLIW packet.” A VLIW packet typically contains a number of instructions which can be executed in the same clock cycle. Each instruction in a VLIW packet typically requires two source operands and the result of execution of each instruction is typically a single destination operand. For example, a VLIW packet containing six instructions would typically require concurrent access to twelve source operands. Moreover, the result of execution of the six instructions would typically be six destination operands.
Typically, the source operands in a VLIW processor are processed by multiple data path blocks, each data path block having a number of execution units such as ALUs and multipliers. Reading twelve source operands in a single clock cycle and/or writing back six destination operands in a single clock cycle requires the VLIW processor to have multiple register file banks to accommodate the reading of a large number of source operands or the writing back of a large number of destination operands. As such, a typical VLIW processor includes a number of register file banks from which source operands are read prior to execution in multiple execution units and to which destination operands are written back after execution of various instructions. Each register file bank is typically associated with, and coupled to, a respective data path block.
The fact that a VLIW processor typically has a number of register file banks and a number of execution units presents a challenge in VLIW busing architecture. In other words, the fact that a VLIW processor has a number of register file banks and a number of execution units requires a number of buses transporting source and destination operands from and to a large number of register file banks. Also, the buses carrying source and/or destination operands are wide buses since each operand can be 32 bits wide or, in some processors, 64 bits wide.
Thus, despite their advantages, the multiple execution units and register file banks also present certain disadvantages in processor design. For example, as mentioned above, multiple execution units and register file banks require a large number of wide buses to accommodate transport of source and destination operands to and from various execution units. As the number of these wide buses grows, more chip area, as well as more power, are consumed. Moreover, it is possible that a desired source operand is not present in a register file bank which is coupled to its corresponding data path block. To address this problem, a recent VLIW design interconnects various register file banks to each other via “move” buses which can accommodate transport of two source operands from one register file bank into another. As such, when a source operand that is a required by a certain data path block does not exist in the register file bank coupled to that data path block, the source operand is transferred by a “move” operation from the register file bank in which the operand resides to the register file bank which is coupled to the subject data path block. The move operation requires a clock cycle and as such slows down the VLIW processor. Moreover, the move operation consumes power and the move buses take up valuable chip area.
There is presently no known desirable technique or processor architecture to adequately address the problem of consumption of chip area for wide buses, such as wide “move” buses linking various register file banks. Moreover, there is presently no known desirable architecture or technique that, in addition to reducing chip area consumed by wide buses utilized to transport source and destination operands, also speeds up the VLIW processor and, moreover, reduces power consumption. As such, there is need in the art for a novel VLIW processor architecture and for new techniques to speed up the VLIW processor, reduce power consumption, and reduce chip area associated with wide buses utilized to transport operands between multiple register file banks and from multiple register file banks to multiple execution units.