Very long instruction word (VLIW) processors are known in the art, an example of which is shown in FIG. 1. As shown in FIG. 1, a conventional processor includes an instruction decoder 105, control sequencing hardware 115, an input/output buffer 130, one or more register files 110, and one or more functional units 120 (which are also referred to as issue slots).
Given this architecture, instructions enter the instruction decoder 105 from an external source. The instruction decoder 105 converts the received instructions into a decoded internal format that is wider but easier to process. The decoded instructions are subsequently used to control the operation of the data path components, which include the input/output buffer 130, the register file 110, and the functional units 120. Since the various operation of conventional processors is known in the art, only a truncated discussion of such processors is provided herein.
The register file 110, which holds temporary working data, is relatively quickly accessible compared to external memory. The functional units (or issue slots) 120 perform the actual computational work associated with the processor.
The control sequencing hardware 115, the register file 110, and the functional units 120 are shown in greater detail in FIG. 2. As shown in FIG. 2, the control sequencing hardware 115 issues an instruction word, which includes control bits (RFC) associated with the register file 110, and control bits (FnC) associated with each of the functional units 120. Given the multiple functional units 120a . . . 120d, the processor of FIG. 2 is capable of performing multiple operations per clock cycle. A more concrete example is provided with reference to FIG. 3.
Specifically, FIG. 3 shows control sequencing hardware 315 coupled to a 64-entry, 32-bit register file 310 and four functional units 322, 324, 326, 328. Given the four functional units 322, 324, 326, 328, the processor of FIG. 3 is capable of performing four operations per clock cycle. For illustrative purposes, the four functional units of FIG. 3 are a first adder 322, a second adder 324, a first multiplier 326, and a second multiplier 328. Thus, the four operations include two (2) addition operations and two (2) multiplication operations.
Each functional unit 322, 324, 326, 328 has two read ports, through which the functional unit receives data, and a single write port, through which the functional unit outputs data. In other words, for the example in FIG. 3, each functional unit receives two values, performs an operation with the two values, and outputs a single value as a result of the performed operation. Specifically, as shown in FIG. 3, the first adder 322 receives R1 and R2 from the register file 310, and also control signal A1C from the control sequencing hardware 315. The first adder 322 performs an add operation on R1 and R2 in response to the control signal A1C. The result of the operation is then output as W1. Similarly, the second adder 324 receives R3 and R4, and outputs W2 in response to control signal A2C. The first multiplier receives R5 and R6, and outputs W3 in response to control signal M1C. And the second multiplier 328 receives R7 and R8, and outputs W4 in response to control signal M2C.
If the register file 310 is a sixty-four (64) entry, thirty-two (32) bit register file, then six (6) bits are required to access the 64-entry register file 310. Thus, if each instruction has a two (2) bit operation field, and 6 bits are required to access the 64-entry register file 310, then the processor would operate on 80-bit instruction words (designated herein as INST[79:0]). For example, the values of R1 through R8 (values that appear on each of the read ports of the register file 310), W1 through W4 (values that appear on each of the write ports of the register file 310), and the control bits for each of the functional units can be represented as:
R1=INST[79:74]
R2=INST[73:68]
W1=INST[67:62]
A1C=INST[61:60]
R3=INST[59:54]
R4=INST[53:48]
W2=INST[47:42]
A2C=INST[41:40]
R5=INST[39:34]
R6=INST[33:28]
W3=INST[27:22]
M1C=INST[21:20]
R7=INST[19:14]
R8=INST[13:08]
W4=INST[07:02]
M2C=INST[1:0]
Given the 64-entry, 32-bit register file 310 of FIG. 3, and the two bits required for the operation, the cost of the register file would be:(64 entries)×(32 bits)×(8 read ports+4 write ports)=24576 bits
As is known, for VLIW processors, each instruction usually contains several operand address fields per operation. Given the high instruction width of such processors, the cost of on-chip storage increases while the efficiency of off-chip instruction decreases. This is often the primary limiting factor in system performance. For at least this reason, there is a heretofore-unaddressed need in the industry.