The present invention relates to a vector processor having an instruction for collectively processing vector data consisting of a plurality of data, and more particularly to an efficient system for transfer of vector data with data compression/expansion between a main storage and a processor.
A vector processor having a vector instruction which allows collective processing of a series of vector data has been developed and used primarily for high speed processing of technical and scientific calculations. In the vector processor, the vector instruction is normally processed in a pipeline fashion.
A vector processor which vector-processes a condition clause (vector-processes a DO loop containing an IF clause in a FORTRAN program) has been recently developed. The condition clause processing system in the vector processor is disclosed in "An Efficient Vectorizing Method of Condition Clause in A Vector Processor" Paper 6F-6 of 25th National Conference of Information Processing Society of Japan. In order to efficiently vector-process the condition clause, it is important to avoid any unnecessary operation which does not meet a condition and unnecessary processing of data which does not meet a condition and need not be stored into a main storage. For example, when a FORTRAN program shown in FIG. 2 is to be vector-processed, only when an I-th element B(I) in an array B is larger than 0, will a a trigonometrical function, i.e., the sine value of the I-th element B(I) of the array B, need to be calculated and a result need to be stored into an I-th element A(I) of an array A. The calculation of the trigonometrical function, i.e. the sine function, is carried out by a previously prepared sub-routine, but the transfer of an argument to the sub-routine (array B in FIG. 2) is usually performed through the main storage. Accordingly, in order to efficiently process the program of FIG. 2, it is desirable to store in the main storage only those values of B(I) which meet the condition of B(I)&gt;0 of the IF clause, transfer them to a subroutine for calculating sine functions, and store the calculated results in A(I). In order to vector-process the series of processes at a high speed, the vector processor HITAC S-810 developed by the assignee of the present application provides the following vector instructions.
(1) Load expansion instruction
When vector data in the main storage is to be loaded into a vector register specified by the instruction, the vector elements are loaded only into those storage locations corresponding to "1" bits (mask bits) in a mask vector of a vector mask register. The vector elements at the storage locations whose corresponding mask bits are "0" are not changed.
(2) Store compression instruction
Only the elements at the storage/locations corresponding to "1" mask bits of the vector mask register, of the vector elements in the vector register specified by the instruction, are stored into the main storage.
The load expansion instruction and store compression instruction in the prior art vector processor are shown in JP-A-58-214963. Unique processing in the load expansion instruction and store compression instruction, compared with a simple load vector instruction and store vector instruction, are described below.
In the simple load vector instruction and store vector instruction, a data address on the main storage is updated for each processing of an element, while in the load expansion instruction and store compression instruction, it is updated only when an element whose corresponding mask bit is "1" is processed.
In the vector processor, the vector instruction is processed in a pipeline fashion at a high speed. In order to further improve the processing speed, parallel-by-element processing is used. In the parallel-by-element processing, operation units (operation pipes) which process the vector instructions in a pipelined fashion and circuits (load/store pipes) which load or store vector data in a pipelined fashion are multiplied so that a plurality of vector elements are parallelly processed in a pipeline fashion in one machine cycle. A parallel-by-element processing type vector processor is known by NEC vector processor SX and disclosed in NEC Research and Deveopment, No. 73, pages 1-6, April 1984.
The parallel-by-element processing utilizes the fact that there is no interaction between vector elements in the processing by the vector instruction, and simply multiplexes the pipeline circuits of the same construction to speed up the processing.
However, when the load expansion instruction or store compression instruction is to be processed by the load/store pipes of the prior art parallel-by-element processing type, the following problem is encountered. In the parallel-by-element processing type load/store pipes, address calculation circuits of the multiple circuits (load/store sub-pipes) operate independently from each other. When a conventional load vector instruction or store vector instruction is processed, the address increments of the respective load/store sub-pipes are the same. For example, when four elements are parallelly processed, the addresses of the respective load/store sub-pipes are incremented by four for each processing of one element. Accordingly, there is no problem even if the sub-pipes are independent. However, when the load expansion instruction or store compression instruction is to be processed by the load/store sub-pipes of the parallel-by-element processing type, it cannot be processed in the prior art system for the following reasons. The address updating during the processing of the load expansion instruction or store compression instruction is done only when the load/store element whose corresponding mask bit is "1" is processed. Between load/store sub-pipes, the increments of the address calculation circuits are not equal. The increment of the address cannot be determined solely by the load/store sub-pipe but the address increment must be altered depending on the number of those elements to be processed in other load/store pipes whose corresponding mask bits are "1". Such processing cannot be attained in the prior art parallel-by-element type pipes in which sub-pipes are basically independent.