The present invention relates generally to improvements in very long instruction word (VLIW) processing, and more particularly to advantageous register file indexing (RFI) techniques for providing indirect control of register addressing in a VLIW processor.
One important processor model is that of vector processing. This model has been used in prior art super computers for many years. Typical features of this model are the use of specialized vector instructions, specialized vector hardware, and the ability to efficiently operate on blocks of data. It is this very ability to operate typically only on vector data types that makes the model inflexible and unable to efficiently handle diverse processing requirements. In addition, in prior art vector processors, support for control scalar processing was typically done in separate hardware or in a separate control processor. Another processor model is the prior art very long instruction word (VLIW) processor model which represents a parallel processing model based on the concatenation of standard uniprocessor type single function operations into a long instruction word with no specialized multicycle vector processing facilities. To efficiently operate a block-data vector pipeline, it is important to have an efficient interface to deliver the individual vector elements. For this purpose, a successful class of prior art vector machines have been register based. The register based vector processors provide high performance registers for the vector elements allowing efficient access of the elements by the functional execution units. A single vector instruction tied to an implementation specific vector length value causes a block data multicycle operation. In addition, many vector machines have provided a chaining facility where operations on the individual vector elements are directly routed to other vector functional units to improve performance. These previous features and capabilities provide the background for the present invention. It is an object of the present invention to incorporate scalar, VLIW, and flexible vector processing capabilities efficiently in an indirect VLIW processor.
In typical reduced instruction set computer (RISC) and VLIW processors, the access of register operands is determined from short instruction word (SIW) bit-fields that represent the register address of operands stored in a register file. In register-based vector processors, specialized hardware is used. This hardware is initiated by a single vector instruction and automates the accessing of vector elements (operand data) from the dedicated vector registers. The multicycle execution on the block of data is also automated.
In the prior art, there have also been specialized hardware techniques used to support the automatic accessing of register operand data. For example, U.S. Pat. No. 5,680,600 which describes a technique for accessing a register file using a loop or repeat instruction to automate the register file addressing. This approach ties the register addressing to a loop or repeat instruction which causes a load or store instruction to be repeated while directing the register address to increment through a register file""s address space. An electronic circuit is specified for reducing controller memory requirements for multiple sequential instructions. Thus, this prior art approach appears to be applied only to load and store type operations invoked by a special loop or repeat instruction. As such, it is not readily applicable to indirect VLIW ManArray processors as addressed further below.
A ManArray family of processors may suitably consist of multiple xe2x80x9cindirect VLIWxe2x80x9d (iVLIW) processors and processor elements (PEs) that utilize a fixed length short instruction word (SIW) of 32-bits. An SIW may be executed individually by one of up to eight execution units per processor and in synchronism in multiple PEs in a SIMD mode of operation. Another type of SIW is able to reference a VLIW indirectly to cause the issuance of up to eight SIW instructions in parallel in each processor and in synchronism in multiple PEs to be executed in parallel.
Operands are stored in register files and each execution unit has one or more read and write ports connected to the register file or files. In most processors, the registers selected for each port are addressed using bit fields in the instruction. With the indirect VLIW technique employed in the ManArray processor, the SIWs making up a VLIW are stored in a VLIW memory. Since each SIW fixes a register operand field by definition for a single operation on register accessed operand data, multiple VLIWs are required whenever a single operand field must be different as required by a processing algorithm. Thus, a suitable register file indexing technique for operation on blocks of data for use in conjunction with such processors and extendible more generally to parallel array processors will be highly advantageous.
This operand-data fixed register specification problem is solved by the present invention by providing a compact means of achieving pipelined computation on blocks of data using indirect VLIW instructions. A double indirect method of accessing the block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction, nor to repeat or loop instructions. Rather, the present technique, termed register file indexing (RFI) allows full programmer flexibilty in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions, and without being limited to use only with repeat or loop instructions. Utilizing the present invention, chaining operations are inherently available without any direct routing between functional units further simplifying implementations. In addition, the present register file indexing architecture reduces the VLIW memory requirements which can be particularly significant depending on the types of algorithms to be coded.
Further, when expressed as unrolled loops of VLIW instructions, many computations exhibit clear register usage patterns. These patterns are characteristic of computational pipelines and can be taken advantage of with the ManArray indirect vector processing embedded in an indirect VLIW processor as adapted as described further herein.
Among its other aspects, the present invention provides a unique initialization method for generating an operand register address, a unique double-indirect execution mechanism, a unique controlling method, and allows a register file to be partitioned into independent circular buffers. It also allows the mixing of RFI and non-RFI instructions, and a scaleable design applicable to multiple array organizations of VLIW processing elements. As addressed in further detail below, the invention reduces both the VLIW memory and, as a consequence, SIW memory requirements for parallel instruction execution in an iVLIW array processor.
These and other features, aspects and advantages of the invention will be apparent to those skilled in the art from the following detailed description taken together with the accompanying drawings.