1. Field of the Invention
The present invention is generally in the field of processors. In particular, the invention is in the field of VLIW (Very Long Instruction Word) processors.
2. Background Art
VLIW (Very Long Instruction Word) processors use an approach to parallelism according to which several instructions are included in a long instruction word which is fetched from memory every clock cycle. The long instruction word fetched from the memory is part of a packet referred to in this application as a VLIW packet or a xe2x80x9cpacket of instructions.xe2x80x9d
Instructions in a VLIW packet can be of different xe2x80x9cinstruction types.xe2x80x9d For example, a certain VLIW packet can have integer ALU type instructions such as xe2x80x9cShift and Addxe2x80x9d and xe2x80x9cComparexe2x80x9d instructions; non-integer ALU type instructions such as xe2x80x9cShift L Variable,xe2x80x9d xe2x80x9cShift R Variable,xe2x80x9d xe2x80x9cMove to BR,xe2x80x9d and xe2x80x9cMove from BRxe2x80x9d instructions. Other exemplary instruction types in a typical VLIW packet are memory type instructions such as xe2x80x9cInteger Load,xe2x80x9d xe2x80x9cInteger Store,xe2x80x9d and xe2x80x9cLine Prefetchxe2x80x9d instructions; floating point type instructions such as xe2x80x9cFloating Point Comparexe2x80x9d and xe2x80x9cFloating Point Clear Flagsxe2x80x9d instructions; and branch type instructions such as xe2x80x9cIndirect Branchxe2x80x9d and xe2x80x9cIndirect Callxe2x80x9d instructions.
Each of the several instructions in a VLIW packet is placed in a particular xe2x80x9cinstruction slot.xe2x80x9d Each instruction type is usually assigned to one or two specific logic units in a VLIW data path for execution. Each such logic unit is referred to as an xe2x80x9cexecution unitxe2x80x9d in the present application.
The individual instructions in a VLIW packet are arranged in different xe2x80x9cissue groupsxe2x80x9d and there can be a number of issue groups in the VLIW packet. By way of background, a VLIW packet typically contains a number of instructions which can be executed in the same clock cycle. Instructions in a VLIW packet which can be executed in the same clock cycle form a single xe2x80x9cissue group.xe2x80x9d By definition, instructions belonging to a same issue group do not depend on the result of execution of other instructions in that same issue group. However, instructions in one issue group may depend on the result of execution of instructions in another issue group. The xe2x80x9clengthxe2x80x9d of an issue group specifies how many instructions are in that issue group. For example, a particular issue group may have a length of two instructions.
Thus, instructions which are in a same issue group are concurrently forwarded (i.e. xe2x80x9cissuedxe2x80x9d) to their respective execution units for execution in a same clock cycle. Accordingly, execution of all instructions in a VLIW packet may take as many clock cycles as there are issue groups in that VLIW packet. Referring to FIG. 1, one known technique for identifying the issue groups in a VLIW packet, such as VLIW packet 100, is now discussed. As shown in FIG. 1, eight individual instructions in VLIW packet 100 are placed in instruction slots 102 through 116. More specifically, instruction 0 is placed in instruction slot 102, instruction 1 is placed in instruction slot 104, instruction 2 is placed in instruction slot 106, instruction 3 is placed in instruction slot 108, instruction 4 is placed in instruction slot 110, instruction 5 is placed in instruction slot 112, instruction 6 is placed in instruction slot 114, and instruction 7 is placed in instruction slot 116.
In this known technique for identifying the issue groups in VLIW packet 100, a designated bit in each instruction slot 102 through 116 is used to identify the different issue groups in the VLIW packet. In the example shown in FIG. 1, the designated bit used for this purpose is isolated by a dashed line. For example, instruction slot 102 shows that the designated bit used for the purpose of identifying the issue group to which instruction 0 belongs is a xe2x80x9c0xe2x80x9d. Likewise, instruction slots 104, 106, and 108 show that the respective designated bits used for the purpose of identifying the issue groups to which instructions 1, 2, and 3 respectively belong are all xe2x80x9c0xe2x80x9d. Instruction slot 110 shows that the designated bit used for the purpose of identifying the issue group to which instruction 4 belongs is a xe2x80x9c1xe2x80x9d while instruction slot 112 shows that the designated bit used for the purpose of identifying the issue group to which instruction 5 belongs is a xe2x80x9c0xe2x80x9d. Finally, instruction slots 114 and 116 show that the respective designated bits used for the purpose of identifying the issue groups to which instructions 6 and 7 respectively belong are both
According to this known technique for specifying and identifying issue groups, when the designated bit in a particular instruction is a xe2x80x9c0xe2x80x9d, that instruction is the last instruction in the issue group. Referring to the above example, instructions 7 and 6 are in the same issue group with instruction 5 which is the last instruction in that issue group. The reason is that the designated bit in instruction 5 is a xe2x80x9c0xe2x80x9d. Instructions 4 is in the same issue group with instruction 3 which is the last instruction in that issue group. The reason is that the designated bit in instruction 3 is a xe2x80x9c0xe2x80x9d. Instruction 2 is the first and last instruction in an issue group by itself. The reason is that the designated bit in instruction 2 is a xe2x80x9c0xe2x80x9d. Likewise, instruction 1 is in an issue group by itself and the same is the case for instruction 0. The reason is that the respective designated bits in instructions 1 and 0 are both xe2x80x9c0xe2x80x9d.
Thus, as shown in FIG. 1, instructions 7 through 5 are in an issue group referred to by numeral 118; instructions 4 and 3 are in an issue group referred to by numeral 120; instruction 2 is in an issue group by itself which is referred to by numeral 122; instruction 1 is in an issue group by itself which is referred to by numeral 124; and instruction 0 is in an issue group by itself which is referred to by numeral 126. Accordingly, there are a total of five issue groups in the exemplary VLIW packet shown in FIG. 1.
One disadvantage with the above-described known technique for specifying and identifying issue groups in a VLIW packet is that the VLIW processor must be designed to account for the possibility of existence of up to eight issue groups in each VLIW packet. Since each issue group takes one clock cycle for its execution, the VLIW processor must be designed to account for the possibility that it may take anywhere between one and eight clock cycles to complete the execution of all the individual instructions in a single VLIW packet. Manifestly, there is a large degree of uncertainty as to whether a VLIW packet fetched from the memory may take one, two, three, four, five, six, seven, or eight clock cycles for its execution. It also follows that the VLIW processor may have to xe2x80x9cwaitxe2x80x9d anywhere between one and eight clock cycles before the processor can fetch another VLIW packet from the memory. It is also manifest that there is a large degree of uncertainty as to how many clock cycles the VLIW processor must xe2x80x9cwaitxe2x80x9d before a new VLIW packet is fetched from the memory. The uncertainties associated with the number of clock cycles required for execution of a VLIW packet, and also number of clock cycles that the VLIW processor must wait, creates difficulties in designing hardware units such as the fetch and decode logic, the scheduling logic, and the data dependency checking logic of the VLIW processor.
Another disadvantage of the known technique described above is that eight bits must be used to identify the issue groups existing in the VLIW packet. In other words, even if there is merely one or two issue groups in that VLIW packet, eight bits must still be used to identify the issue groups in the VLIW packet. The fact that eight bits are used to identify the issue groups existing in a VLIW packet means that all of the eight individual instructions in a VLIW packet must be scanned in order to determine the existing issue groups in the VLIW packet. The reason is that the value of each respective designated bit in each instruction must be known in order to determine the issue groups existing in the VLIW packet. The need to scan all of the eight instructions in a VLIW packet every time a VLIW packet is fetched results in an undesirable logic complexity.
From the above discussion it is apparent that there is need in the art for a VLIW packet packaging scheme which results in a greater certainty as to the possible number of issue groups in the VLIW packet. Moreover, it is preferable to use fewer than eight bits to designate all the possible issue groups in the VLIW packet and it is also desirable to avoid the need to scan all of the bits and instructions in a VLIW packet to identify the issue groups existing in the VLIW packet.
The present invention is an apparatus and method for issue grouping of instructions in a VLIW processor. The invention permits one, two, or three issue groups (but no greater than three issue groups) in each VLIW packet. The invention utilizes a template in each VLIW packet. In one embodiment of the invention, the template comprises two issue group end markers where each issue group end marker comprises three bits. The three bits in the first issue group end marker identifies the instruction which is the last instruction in the first issue group. Likewise, the three bits in the second issue group end marker identifies the instruction which is the last instruction in the second issue group.
Any instructions in the VLIW packet falling outside the two expressly defined first and second issue groups are placed in a third issue group. As such, three issue groups can be identified by use of the two issue group end markers. Using a template containing the two issue group end markers, the VLIW packet can have one, two, or three issue groups (but no greater than three issue groups).
In one embodiment of the invention, the template of the VLIW packet includes a chaining bit. The chaining bit is used to xe2x80x9cchainxe2x80x9d instructions appearing after the last instruction of the last issue group of a first VLIW packet to the instructions in the first issue group of a second VLIW packet. As such, with the aid of the chaining bit, a combined issue group comprising instructions in the first and second VLIW packets can be formed.
In one embodiment, the invention uses a mask generation logic along with other logic blocks to generate an appropriate mask. The generated mask is used to pass through instructions in a VLIW packet which belong to a same issue group for execution in a same clock cycle.
According to the present invention the uncertainty as to the number of clock cycles necessary to execute all of the individual instructions in a VLIW packet is substantially reduced since there can be no more than three issue groups in the VLIW packet. Moreover, one embodiment of the invention utilizes only six bits, i.e. two end markers each having three bits, to identify all the issue groups in a VLIW packet. Accordingly, fewer than eight bits are used to identify all issue groups in a VLIW packet and also there is no need to scan the entire VLIW packet and each individual instruction in the VLIW packet to identify all the issue groups existing in a VLIW packet.
The fact that the issue grouping information is entirely confined to the template in the VLIW packet permits the VLIW processor to extract the issue grouping information quickly with a simple mask instead of having to extract the issue grouping information in bits from diverse bit positions in the VLIW packet. Thus, the invention optimizes the speed and power consumption associated with various hardware units of the VLIW processor such as the fetch and decode logic, the scheduling logic, and the data dependency checking logic.
Moreover, according to the present invention, as a result of limiting the number of issue groups in a VLIW packet to a maximum of three, each VLIW packet may take a maximum of three clock cycles to execute. This results in a simpler fetch and decode logic since the fetch and decode logic does not have to accommodate situations where there are four, five, six, seven, or eight issue groups in a single VLIW packet.
Further, since the number of issue groups in the present invention is limited to three, the logic used for the chaining of instructions from a first VLIW packet to an issue group in a second VLIW packet is also simpler since the chaining takes place either from the last instruction in the second issue group of the first VLIW packet or from the last instruction in the third issue group of the first VLIW packet. However, in other VLIW processor designs, chaining could take place from the last instruction in the second issue group, the last instruction in the third issue group, the last instruction in the fourth issue group, the last instruction in the fifth issue group, the last instruction in the sixth issue group, or the last instruction in the seventh issue group. To accommodate this wide range of chaining possibilities, the hardware unit for data dependency checking and the hardware for forwarding instructions to execution units in those other VLIW processors are more complex, slower, and consume more power.