In an effort to increase the processing speed and flexibility of multiprocessor computer processing systems, the parent application to the present invention, application Ser. No. 07/459,083, provides a cluster architecture for highly parallel multiprocessor systems wherein a multiple number of processors and external interface means can make multiple and simultaneous requests to a common set of shared hardware resources.
Regardless of the manner in which such multiprocessor systems are organized, the individual performance of each of the processors in a multiprocessor system can and does affect the overall performance of the system. Similarly, the functional capabilities of the individual processor can also affect the performance of the system. Accordingly, most prior art supercomputers have attempted to increase both the performance and the functional capabilities of the individual computer processors in such supercomputers.
One of the first performance and functional improvements involved the use of both a scalar and vector processing element in each of the processors as shown, for example, in U.S. Pat. No. 4,128,880, issued Dec. 5, 1978 to S.R. Cray Jr. Since that time, numerous improvements have been made to the functional and operational capabilities of various scalar/vector processors in an effort to increase the performance of such processors.
While the improvements made in the art of scalar/vector processors have increased the performance of scalar/vector processors, there remain a number of areas in which the performance and operation of scalar/vector processors can be improved. Some of the areas of improvement include providing coordination mechanisms between the scalar and vector processors, particularly with respect to instruction execution in each of the processors, allowing the functional units of the vector processor to complete different types of arithmetic operations in a different number of cycles, allowing both the scalar and vector processor to access shared resources in a non-sequential manner, providing mechanisms for accessing the vector registers that allow the vectorization of conditional IF statements and the ability to access the vector registers at relative start addresses, and improving the ability of the scalar/vector processor to perform context switches.
One of the areas in which present scalar/vector processors experience performance problems is in the instruction processing procedure for the vector processor. Current vector processors put some number of vector instructions in a wait-to-issue queue. As a vector instruction already in the vector processor pipeline completes, the resources required by the waiting instruction are released. These resources include memory, vector registers, scalar values, and functional units. It is the function of the instruction control mechanism of the scalar/vector processor to determine when the required resources for a vector instruction are available. In present instruction control mechanisms, the vector instruction waits to enter the instruction pipeline during the time that the control mechanism surveys to see if the required resources are free, during the time that those resources come free and during the time that the control mechanism actually recognizes that the resources are available. It is only after these increments of time have elapsed that a new vector instruction is issued. While this resource determination process is ongoing, no new vector instructions are fed into the instruction pipeline. As a result, a bubble or gap in the instruction pipeline of the vector processor is created that decreases the processing performance of the vector processor.
Although present scalar/vector processors are capable of increased performance as compared to traditional computer processors, areas still exist in which performance improvements can be made in the design of scalar/vector processors. Accordingly, it would be desirable to provide a design for a scalar/vector processor and methods and apparatus associated therewith that are capable of improving the performance and operation of the scalar/vector processor.
The present invention provides a data processing apparatus including vector registers and vector functional units. Program instruction initiation means responds to a first instruction for initiating processing of at least one vector operand of the first instruction in one of the vector functional units. The program instruction initiation means responds to a second instruction for initiating the processing of at least another vector operand of the second instruction in the one vector functional unit dependent upon completion of the first instruction. A first vector control register (VVC) register corresponds to a first vector register containing the other vector operand and stores the identity of a second vector register containing the one vector operand. A second VVC register corresponds to the second vector register and maintains a busy or a non-busy status of the second vector register. Means operating with the second VVC register releases the second vector register to a non-busy status when the first instruction is completed. Logic means is coupled to the first VVC register and responds to the non-busy status of the second vector register for beginning processing of the other vector operand in the one functional unit.
In one form of the present invention, a vector processor includes vector registers, each including means for intermittenly storing data as a plurality of elements of an ordered set of data. Addressing means for each vector register includes a read address counter for intermittenly reading data from the vector register for processing the plurality of elements of the ordered set of data. The vector processor includes VVC registers each corresponding to a respective one of the vector registers. Program controlled means responds to a first instruction, which selects a first vector register as a source of an unordered set of operand data, for initializing a first VVC register, corresponding to the first vector register, with control data indicating: (1) a read busy condition of the first vector register; (2) an identity of a second operand coregister, if any; and (3) a vector length (VL) count of the first instruction. Logic means associated with the first corresponding VVC register increments a first address counter associated with the first vector register and decrements the VL count during each succeeding clock cycle for reading successive elements of said ordered set of operand data from said first vector register. The logic means responds to the VL count for terminating the incrementing and freeing the first vector register when said VL count reaches a value indicating the completion of said first instruction.
In one embodiment of the present invention, the program controlled means initializes the first VVC register with read and write port control data. In addition, the logic means is effective to control a transfer of an ordered set of data between one of said ports and said first vector register.
In one embodiment of the present invention, the program controlled means initiates the first VVC register with dependency register control data when the first instruction has been dependently initiated to a functional unit or part which is busy processing a current instruction. In this embodiment, the logic means is further effective to inhibit said incrementing of the first address counter and decrementing the VL count until a not-read-busy signal is sent from a second VVC register corresponding to a second operand register selected by the current instruction to said first VVC register indicating completion of the current instruction, such that the ordered set of operand data of the first instruction is read for processing immediately behind an ordered set of operand data of the current instruction with no gap in the operand pipeline data.
In one embodiment of the present invention, the program controlled means further initializes the first VVC register with instruction chaining control data. The logic means further includes chaining control logic responsive to the instruction chaining control data and to the storing of elements of an ordered set of data into the first vector register and the reading of the ordered set of data from the first vector register for incrementing the read address counter to a new vector element position only if the data element has been or is being stored into said vector element position. In this embodiment, the elements of the ordered set of data are preferably received from a main memory via a read port and are stored into their respective vector element positions but in a sequence differing from the order in which the data elements were requested from the main memory. The chaining control logic responds to controls associated with said read port for controlling the reading of the ordered set of data from the first vector register in its ordered sequence.
In one form of the present invention, a data processing apparatus includes vector registers, and a segmented functional unit adapted to receive successive inputs of operands while holding data for operations still being completed. Program controlled means responds to at least a first and second instruction for entering first and second ordered data sets into the segmented functional unit for processing. The data sets are held concurrently in the segmented functional unit and the first and second instructions have result register fields respectively defining first and second vector result registers. Link list registers are associated with the segmented functional unit. Each link list register is assigned to a respective one of the vector registers. Program controlled means responds to the result fields of the instructions for selecting the link list registers assigned to the respective first and second vector result registers. Means stores, in the link list register selected by the second instruction, a number of the first vector result register. Means operating with the link list register selected by the second instruction monitors the status of the first result register and directs output results of the segmented functional unit to the second result register when a first result register status changes to a non-busy status.
The link list register control mechanism controls the writing of result data into the vector registers to enable data to be stored to the selected elements in a vector register while data is being read from other elements of the vector register.