In an effort to increase the processing speed and flexibility of multiprocessor computer processing systems, the parent application to the present invention, application Ser. No. 07/459,083 now U.S. Pat. No. 5,197,130, issued Mar. 23, 1993, provides a cluster architecture for highly parallel multiprocesor systems wherein a multiple number of processors and external interface means can make multiple and simultaneous requests to a common set of shared hardware resources.
Regardless of the manner in which such multiprocessor systems are organized, the individual performance of each of the processors in a multiprocessor system can and does affect the overall performance of the system. Similarly, the functional capabilities of the individual processor can also affect the performance of the system. Accordingly, most prior art supercomputers have attempted to increase both the performance and the functional capabilities of the individual computer processors in such supercomputers.
One of the first performance and functional improvements involved the use of both a scalar and vector processing element in each of the processors as shown, for example, in U.S. Pat. No. 4,128,880, issued Dec. 5, 1978 to S. R. Cray Jr. Since that time, numerous improvements have been made to the functional and operational capabilities of various scalar/vector processors in an effort to increase the performance of such processors.
While the improvements made in the art of scalar/vector processors have increased the performance of scalar/vector processors, there remain a number of areas in which the performance and operation of scalar/vector processors can be improved. Some of the areas of improvement include providing coordination mechanisms between the scalar and vector processors, particularly with respect to instruction execution in each of the processors, allowing the functional units of the vector processor to complete different types of arithmetic operations in a different number of cycles, allowing both the scalar and vector processor to access shared resources in a non-sequential manner, providing mechanisms for accessing the vector registers that allow the vectorization of conditional IF statements and the ability to access the vector registers at relative start addresses, and improving the ability of the scalar/vector processor to perform context switches.
One of the areas in which present scalar/vector processors experience performance problems is in the instruction processing procedure for the vector processor. Current vector processors put some number of vector instructions in a wait-to-issue queue. As a vector instruction already in the vector processor pipeline completes, the resources required by the waiting instruction are released. These resources include memory, vector registers, scalar values, and functional units. It is the function of the instruction control mechanism of the scalar/vector processor to determine when the required resources for a vector instruction are available. In present instruction control mechanisms, the vector instruction waits to enter the instruction pipeline during the time that the control mechanism surveys to see if the required resources are free, during the time that those resources come free and during the time that the control mechanism actually recognizes that the resources are available. It is only after these increments of time have elapsed that a new vector instruction is issued. While this resource determination process is ongoing, no new vector instructions are fed into the instruction pipeline. As a result, a bubble or gap in the instruction pipeline of the vector processor is created that decreases the processing performance of the vector processor.
Although present scalar/vector processors are capable of increased performance as compared to traditional computer processors, areas still exist in which performance improvements can be made in the design of scalar/vector processors. Accordingly, it would be desirable to provide a design for a scalar/vector processor and methods and apparatus associated therewith that are capable of improving the performance and operation of the scalar/vector processor.