The present invention relates to processing devices in general, and more particularly to processing devices whose designs are based on a very long instruction word (VLIW) architecture. More specifically, the present invention relates to register file access in a VLIW-based machine.
In response to the continuing demand for increased processing speed, designers have developed central processing unit (CPU) architectures in which a single CPU has characteristics of a conventional uni-processor and a parallel machine. A single instruction register and instruction sequence unit execute programs under a single flow of control. However, arithmetic and logic channels (ALC""s) within the CPU perform multiple primitive operations (i.e., simple arithmetic, logic, or data transfer operations) simultaneously. An ALC provides integer computations and logic operations.
A compiler analyses the source code of a program and identifies all the simultaneous operations that can be performed. The compiler produces assembly code comprising instructions having multiple operations to effect multiple parallel operations. Since the instruction word held in the instruction register must specify multiple independent operations, each to be performed by a different ALC, this approach employs a very long instruction word (VLIW) instruction format. For this reason, such CPU designs are commonly known as a VLIW architecture.
The memory of a VLIW machine is commonly referred to as a register file. A register file provides functionality similar to conventional general purpose registers, namely, temporary storage for intermediate results during arithmetic computations, loop execution, branching handling, and so forth. Ideally, there is a single register file. A single register file provides a straightforward memory model, thus simplifying the design of the processor.
Conventional VLIW architectures, however, are faced with the reality that such an approach is not practically feasible. One reason is that the very high number of read and write ports needed to implement a single register file design increases data access times exponentially. Secondly, circuit design rule limits are quickly reached because of the great numbers data lines that must be brought to the one register file. Performance and design rule limits, therefore, impose a limit on the number of ports for any given size register file and any given number of ALC""s.
Consequently, VLIW architectures are typically provided with multiple register files. For example, one register file may be provided for integer results and another register file for floating point results. Performance is slightly degraded, however, in situations involving integer-to-floating point conversion and vice-versa. The operation requires movement of data between the two register files, a time consuming operation. Some VLIW architectures use a special xe2x80x9croll-outxe2x80x9d floating point register file. This adds further complexity to an already complex hardware design.
What is needed is a computer architecture which can address the foregoing shortcomings of conventionally designed VLIW-based central processing units. There is a need for a design which allows more efficient use of register files given the fact that data lines for read and write operations are limited. It is desirable to provide apparatus and methods which can realize increased access to register files in a wide instruction format central processing unit. It is further desirable to provide apparatus and methods for increased access to register files with respect to integer instructions and floating point instructions.
In a wide instruction architecture processor device, an instruction execution unit provides integer and floating point capability within its constituent arithmetic logic channels. Results are written out to a register file where integer results are given higher priority over floating point results, which are buffered, in order to increase integer operation throughput. By buffering floating point results and giving priority to integer results, fewer register file write ports are needed. A bypass mechanism allows access to floating point results during their pendency in the buffer. Dual serially-configured integer units are configured to enable two-operand and combined (three-operand) instructions to be delivered to an arithmetic and logic channel at every clock cycle. Similarly, dual parallel pipelined floating point units are configured to permit two-operand and combined (three-operand) floating point instructions to be delivered to an arithmetic and logic channel on each clock cycle.
A processing unit device in accordance with the invention includes an instruction having a plurality of arithmetic logic channels (ALC""s). A register file in data communication with the instruction execution unit is provided with plural read ports and write ports. Each ALC includes a single ALC output coupled to a write port of the register file. First and second computation units are provided. Input selector circuitry selectively delivers data from read ports of the register file to the first and second computation units. An output selector selectively couples the outputs of the first and second computation units.
Control logic is provided to detect an output conflict wherein the first and second computation units produce results that are ready to be written to the register file. The control logic is configured to deliver one of the results to the ALC output. The control logic is further configured to deliver the other result to a buffer.
A bypass bus couples the ALC""s together. Results produced by an ALC can be delivered directly to another ALC for subsequent operations. The bypass obviates the step of writing results to the register file, only to be read back by an ALC in the next machine cycle.
In an embodiment of the invention the first computation unit is integer computation logic and the second computation unit is floating point computation logic. In a further embodiment of the invention, the integer computation logic comprises dual integer units configured in a serial manner to provide two-operand and combined integer operations. The floating point computation unit comprises dual floating point units configured to provide two-operand and combined floating point operations.
Further in accordance with the invention, an arithmetic and logic channel includes first and second integer units. An output of the first integer unit is in data communication with an input of the second integer unit. Input selection circuitry selectively couples data from the read ports of the register file to the inputs of the first integer unit and to the second input of the second integer unit. This arrangement permits integer instructions to begin execution at each clock cycle.
The arithmetic and logic channel further includes first and second floating point units. The floating point units are configured for parallel, independent operation. The input selection circuitry is provided with a buffer which can selectively receive data from the read ports of the register file. Outputs of the floating point unit are coupled to the input selection circuitry. The input selection circuitry is configured to coupled data from the read ports, data from the buffer, and the floating point outputs to the inputs of the floating point units. This arrangement provides floating point instructions of the two-operand and three-operand variety to begin execution at every clock cycle.
In accordance with the invention, a method of operating an arithmetic and logic unit includes delivering first and second operands to a first computation unit. Similarly, third and fourth operands are delivered to a second computation unit. Upon detecting a conflict condition wherein a first result from said first computation unit and a second result from said second computation unit are produced in a the same clock cycle, the first result is buffered. The second result is delivered to an output port. In a subsequent clock cycle, the first result is delivered to the output port from the buffer.
Further in accordance with the invention, a method of operating an arithmetic logic unit includes delivering first and second operands to a first integer unit in a first clock cycle to produce a first result. In a second clock cycle, producing the first result and delivering it to a second integer unit. Also in the second clock cycle, delivering a third operand to the second integer unit and delivering fourth and fifth operands to the first integer unit. This arrangement enables two-operand and three-operand instructions to begin at every clock cycle.