Typical general purpose computer systems comprise one of many different architectures. Architecture, as used herein, refers to the instruction set and resources available to a programmer for a particular computer system. Thus, architecture includes instruction formats, instruction semantics, operation definitions, registers, memory addressing modes, address space characteristics, etc. An implementation is a hardware design or system that realizes the operations specified by the architecture. The implementation determines the characteristics of a microprocessor that are most often measured, e.g. price, performance, power consumption, heat dissipation, pin number, operating frequency, etc. Thus, a range of implementations of a particular architecture can be built, but the architecture influences the quality and cost-effectiveness of those implementations. The influence is exerted largely in the trade-offs that must be made to accommodate the complexity associated with the instruction set.
Most architectures try to increase efficiency in their respective implementations by exploiting some form of parallelism. For example, in single instruction multiple data stream (SIMD) architecture implementations, the various processing elements (PEs) can all perform the same operation at the same time, each with its own local (different) data.
One common architecture is the very long instruction word (VLIW) architecture. Although very similar to SIMD systems, a VLIW can perform a different operation on each PE within a single cycle. The grouping of operations that PEs can execute together on a cycle is statically determined. In other words, the choice of which operations that can simultaneously execute together is made at compile time. Moreover, their execution is synchronous. This means that each of the PEs is processing the instructions in a lock-step manner. Note that VLIW PEs are sometimes referred to as function units (FUs).
Another common architecture is the multiple instruction stream, multiple data stream (MIMD) architecture. In MIMD systems, each processor is operating independently of the other processors. A MIMD processor may be as small as a single PE. Thus, MIMD is more flexible than SIMD or VLIW, because MIMD allows for a wider range of parallel control flow constructs to be directly implemented. However, MIMD asynchrony yields a multitude of problems that neither SIMD nor VLIW machines evidence. One problem is that it is very expensive for processors within a MIMD machine to communicate with each other, which often results in MIMD parallelism unexpectedly slowing down the program because communication overhead exceeded increased speed achieved by parallel execution. The static timing properties of SIMD and VLIW facilitates static orchestration that enables communication between PEs without undue overhead.
Another architecture is XIMD, which is similar to MIMD, and was developed at Carnegie Mellon. XIMD uses an array of PEs, wherein each PE includes an independent branch unit. Thus, in one mode the PEs could run autonomously and independently, but they could share a branch condition. The PEs could all test the same branch condition, and then branch in harmony. In other words, replicating the same control flow sequence on all PEs, and then having all PEs test a common set of branch conditions, effectively converts the XIMD architecture processor into a VLIW architecture processor, because each processor would branch the same way in response to the same branch condition each time. However, an XIMD processor cannot directly cause another processor to branch. The processor must change a shared Boolean condition code that is visible to other PEs. The condition code itself does not force those processors to branch. Those processor have to simultaneously execute branch instructions that test this condition code and branch to their separate but closely related branch targets. To emulate a VLIW, all the participating PEs execute separate branch instructions test the shared condition code. The participating PEs have to execute highly orchestrated programs that follow closely related paths of execution. This greatly complicates many aspects of branching, e.g. indexed branches, dynamically linked branches, or other multi-way branches. Consequently, generating software for XIMD PEs is very complex. XIMD architecture only passes single bit (Boolean) condition code to other processors. For further information, please review Wolfe, A., “A Variable Instruction Stream Extension to the VLIW Architecture,” in Proceedings of ASPLOS IV, 1991, pp. 2–14; and Newburn, C. J., et al. “Balancing Fine- and Medium-grained Parallelism in Scheduling Loops for the XIMD Architecture,” Proceedings of Architecture and Compilation Techniques for Fine and Medium Grain Parallelism (A-23), 1993, pp. 39–52; which are both hereby incorporated herein by reference.
SIMD, VLIW, and MIMD architecture systems can be implemented using a field programmable gate array device (FPGA). FPGAs can be electrically programmed to perform various specific logic functions and have been configured to operate as a VLIW processor.