1. Field of the Invention
The present invention is generally in the field of processors. In particular, the invention is in the field of VLIW processors.
2. Background Art
VLIW (Very Long Instruction Word) processors use an approach to parallelism according to which several instructions are included in a very long instruction word. Each very long instruction word fetched from the memory is part of a packet referred to in this application as a “VLIW packet” or an “instruction packet.”
By way of background, a VLIW packet typically contains a number of instructions which can be executed in the same clock cycle. Instructions in a VLIW packet which can be executed in the same clock cycle form a single “issue group.” By definition, instructions belonging to a same issue group do not depend on the result of execution of other instructions in that same issue group. However, instructions in one issue group may is or may not depend on the result of execution of instructions in another issue group. The “length” of an issue group specifies how many instructions are in that issue group. For example, a particular issue group may have a length of two, three, four, five, or six instructions. Thus, the individual instructions in a VLIW packet are arranged in different issue groups and there can be a number of issue groups in a VLIW packet.
Instructions which are in a same issue group are concurrently forwarded (i.e. “issued”) to their respective execution units for execution in a same clock cycle. Accordingly, execution of all instructions in a VLIW packet takes as many clock cycles as there are issue groups in that VLIW packet. For example, if a particular VLIW packet contains two issue groups, two clock cycles are required to execute that VLIW packet.
Referring to FIG. 1, one possible composition of a VLIW packet, such as VLIW packet 100, is now discussed. As shown in FIG. 1, seven individual instructions in VLIW packet 100 are placed in “instruction slots” 104 through 116. More specifically, instruction 0 is placed in instruction slot 104, instruction 1 is placed in instruction slot 106, instruction 2 is placed in instruction slot 108, instruction 3 is placed in instruction slot 110, instruction 4 is placed in instruction slot 112, instruction 5 is placed in instruction slot 114, and instruction 6 is placed in instruction slot 116. In exemplary VLIW packet 100, each individual instruction 0 through 6 is a 16-bit instruction.
Exemplary VLIW packet 100 also includes template 102 which contains information such as how many issue groups exist in VLIW packet 100 and which instructions in exemplary VLIW packet 100 belong to the same issue group. Moreover, template 102 typically contains information for assigning instructions to particular instruction slots in a VLIW packet for execution in appropriate execution units. In exemplary VLIW packet 100, template 102 comprises 16 bits. Thus, the entire VLIW packet 100 consists of 128 bits, i.e. seven 16-bit instructions plus a 16-bit template.
FIG. 2 shows another possible composition of a VLIW packet. As shown in FIG. 2, four individual instructions in VLIW packet 200 are placed in “instruction slots” 204 through 210. More specifically, instruction 0 is placed in instruction slot 204, instruction 1 is placed in instruction slot 206, instruction 2 is placed in instruction slot 208, and instruction 3 is placed in instruction slot 210. In exemplary VLIW packet 200, each individual instruction 0 through 2 is a 32-bit instruction while individual instruction 3 is a 16-bit instruction.
As with exemplary VLIW packet 100, exemplary VLIW packet 200 also includes a template, i.e. template 202, which contains information such as how many issue groups exist in VLIW packet 200 and which instructions in exemplary VLIW packet 200 belong to the same issue group. Moreover, template 202 typically contains information for assigning instructions to particular instruction slots in a VLIW packet for execution in appropriate execution units. In exemplary VLIW packet 200, template 202 comprises 16 bits. Thus, the entire VLIW packet 200 consists of 128 bits, i.e. three 32-bit instructions plus one 16-bit instruction and a 16-bit template.
Although VLIW processors result in a great advantage in parallel processing of a large number of instructions, there is need to improve the speed and power consumption of conventional VLIW processors and also achieve a more area-efficient processor. To illustrate these points, reference is made to exemplary VLIW packet 200. In exemplary VLIW packet 200 there are three “long instructions” (i.e. three 32-bit instructions) and one “short instruction” (i.e. one 16-bit instruction). Moreover, suppose that there are two issue groups in exemplary VLIW packet 200; a first issue group consisting of long instructions 0 and 1 and a second issue group consisting of long instruction 2 and short instruction 3.
After exemplary VLIW packet 200 is fetched from a cache or an external memory, the four instructions in VLIW packet 200 must be forwarded to appropriate execution units for execution. To account for the possibility that all of the instructions in a given VLIW packet may belong to a single issue group, the instruction bus coupled to the execution units of the VLIW processor must be 112 bits wide to carry all four instructions in the VLIW packet at the same time. However, as illustrated in the present example, the first issue group consists of merely two long instructions requiring an instruction bus that is only 64 bits wide while the second issue group consists of merely one long instruction and one short instruction requiring an instruction bus that is only 48 bits wide. Thus, in the case of exemplary VLIW packet 200, an instruction bus that is 64 bits wide is all that is needed to handle the processing of both the first and second issue groups in the VLIW packet. As such, a 112-bit wide instruction bus would result in an unnecessary power consumption associated with 48 bus lines that are not needed in the processing of exemplary VLIW packet 200. Further, an instruction bus which is 112 bits wide requires considerably greater chip area as compared with an instruction bus which is only 64 bits wide.
Moreover, many of the VLIW processor's logic units and resources would not be used in an effective manner during the execution of exemplary VLIW packet 200 which requires two clock cycles for its execution. During the execution of the first issue group consisting of long instructions 0 and 1, some of the processor's logic units, such as the instruction fetch unit, are not being used. The reason is that an instruction fetch operation is not required until after completion of the execution of the second issue group which would not occur until the second clock cycle. However, the clocked circuitry inside the instruction fetch unit consumes power even though no instruction is being fetched. As such, the additional clock cycle required for the execution of the second issue group results in an unnecessary power consumption in various logic units such as the instruction fetch unit. In essence, all logic units in the VLIW processor which are being clocked, but not utilized until the completion of the execution of the second issue group, contribute towards an unnecessary power consumption. Examples of units contributing to the unnecessary consumption of power are the fetch logic unit, the decode logic unit, and various buses.
Furthermore, during the execution of the first issue group of exemplary VLIW packet 200, it would be desirable to utilize the VLIW processor resources and logic units to execute an independent issue group belonging to another VLIW packet. Execution of two independent issue groups in the same clock cycle would, manifestly, result in a significant increase in the speed of the VLIW processor. However, it is desirable to utilize the resources of the VLIW processor in such a manner so as to not cause a significant increase in the power consumption of the VLIW processor while significantly increasing the speed of the VLIW processor by executing two independent issue groups belonging to two different VLIW packets in the same clock cycle.
Thus, the conventional VLIW processor architecture results in an unnecessary power consumption while permitting the execution of only a single issue group per clock cycle. Moreover, the conventional VLIW processor requires a relatively large chip area for an instruction bus which is too wide and not effectively used. As such, there is need in the art to overcome the above-discussed shortcomings in the conventional VLIW processors.