1. Field of the Invention
The present invention generally relates to parallel computer architectures and, more particularly, to a Very Long Instruction Word (VLIW) computer architecture having multiple arithmetic logic units (ALUS) with an optimal register file organization within chip boundaries to minimize off-chip crossing delays and to maximize communications between the ALUS.
2. Description of the Prior Art
There is a diversity of parallel computer architectures. Michael J. Flynn in "Very High Speed Computing Systems", Proceedings of the IEEE, Vol. 54, 1966, pp. 1901-1909, adopted a taxonomy to classify architectures based on the presence of single or multiple streams of instructions and data. There are generally four categories defined as follows:
SISD (single instruction, single data stream) which defines serial computers.
MISD (multiple instruction, single data stream) which involves multiple processors applying different instructions to a single data stream.
SIMD (single instruction, multiple data streams) which involves multiple processors simultaneously executing the same instruction on different data streams.
MIMD (multiple instruction, multiple data streams) which involves multiple processors autonomously executing diverse instructions on multiple data streams.
Using Flynn's taxonomy, the subject invention is related to architecture that may be loosely described as being MISD. In more detail, the computer architecture relevant to the invention is SISD in the sense that there is a single instruction stream operating on a single stream of data. However, unlike conventional SISD processors, each of the "single" instructions contains multiple operations. For example, a "single" instruction may include eight arithmetic operations, several load/store operations, and a multiple-way branch. The term that is attached to such processors is VLIW for Very Long Instruction Word. A description of a particular version of a VLIW computer architecture is provided by K. Ebcioglu in "Some Design Ideas for a VLIW Architecture for Sequential-Natured Software" published in Parallel Processing, M. Cosnard et al., editors, pp. 3-21, North Holland, 1988.
The connection between the VLIW and a conventional sequential machine is the VLIW compiler which, in principle, accepts a program for a sequential machine and re-compiles that program into VLIW instructions, with re-ordering of instructions taking place to keep the multiple operations of the VLIW hardware active while preserving the net effect of the original sequential program. Thus, from the point of view of the original sequential code, the engine is MISD; but from the point of view of the VLIW instructions, it is SISD.
The subject invention implements a VLIW parallel architecture employing register files. In the application of conventional multiport register file techniques to VLIW machines, the large number of ports quickly creates size and performance problems. One approach to the problem was presented by Junien Labrousse and Gerrit A. Slavenburg in their paper entitled "A 50 MHz Microprocessor with a Very Long Instruction Word Architecture" at the 1990 IEEE International Solid-State Circuits Conference, Feb. 14, 1990. Their 32-bit VLIW chip consisted of several independent functional units controlled on a cycle-by-cycle basis by a 200-bit instruction. The functional units include two identical 32-bit ALUS, a 32-bit data memory interface, a register file, a constant-generation unit and a branch unit. All units are connected to a shared on-chip multiport memory from which the ALUs take operands and into which results are written. Any previously computed result can be used as an operand by either ALU.
In order to gain the necessary bandwidth, the multiport memory of the Labrousse et al. VLIW architecture required a special design with each operand of each functional unit having a local storage unit called a funnel file. All functional units and funnel files are located on both sides of a crossbar switch. The register file is considered as but one of the functional units and, therefore, access to the register file by the ALUs is through the multiport memory.