Architectures which provide high instruction level parallelism (ILP) are known. Such an architecture may be achieved by removing or reducing data flow and control flow constraints. Data flow constraints which are not inherent in original code arise from lack of sufficient resources for initiation and execution of multiple instructions concurrently. Control flow problems are caused by branches which force unpredictable changes in the sequential order of code execution. Removing these obstacles allows for the formation of larger basic blocks of instructions, thereby resulting in higher instruction level parallelism. The data flow problems are reduced by increasing the number of functional units, registers, condition bits, by pipelining the functional units and using nonblocking caches. The control flow problem is reduced by using techniques such as conditional execution, speculative execution and software pipelining, thereby leveraging hardware support. Accordingly, for high instruction level parallelism, the processor architecture includes a closely tied hardware and compiler architectures. Such an architecture is discussed in An Architecture for High Instruction Level Parallelism, Proceeding of the 28th Annual Hawaii International Conference on System Sciences, Arya et al, 1995, p 153, which is hereby incorporated by reference in its entirety.
The success of a microprocessor architecture is highly dependent on the software applications that run on that processor architecture. With thousands of applications running on a particular processor architecture, it is difficult to design a new architecture with a new instruction set architecture (ISA) and expect every software vendor to port its software to the new design. The inability to change the instruction set architecture also forces the processor to improve with the speed of the improvements in the process technology. Therefore, it is desirable to design an architecture with a new ISA and new features such as conditional and speculative execution as well as perhaps a large number of registers, while still being able to execute old software. The old software should also run with competitive performance on the new machine. The technique is also applied to architectures in which the instruction scheduling is done in software and instruction grouping information is coded into each instruction. Similarly, this technique can be applied to superscalar machines to remove the grouping logic from the pipeline to allow for higher issue bandwidths than are possible with superscalar architectures.