Many scientific data processing tasks involve extensive arithmetic manipulation of ordered arrays of data. Commonly, this type of manipulation or "vector" processing involves performing the same operation repetitively on each successive element of a set of data. Most computer are organized with an arithmetic unit which can communicate with a memory and with input-output(I/O). To perform an arithmetic function, each of the operands (two numbers to be added, subtracted or multiplied or otherwise operated upon) must be successively brought to the arithmetic unit from memory, the functions must be performed, and the result must be returned to the memory. Machines utilizing this type of organization, called scalar machines, are not normally optimized for practical use in large scale vector processing tasks.
In order to increase processing speed and hardware efficiency when dealing with ordered arrays of data, vector machines have been developed. A vector machine is one which deals with ordered arrays of data by virtue of its hardware organization, rather than by a software program and indexing, thus attaining higher speed of operation. One such vector machine is disclosed in U.S. Pat. No. 4,128,880, issued Dec. 5, 1978. The vector processing machine of this patent employs one or more registers for receiving vector data sets from a central memory and supplying the data to segmented functional units, wherein arithmetic operations are performed. More particularly, eight vector registers, each adapted for holding up to sixty-four vector elements, are provided. Each of these registers may be selectively connected to any one of a plurality of functional units and one or more operands may be supplied thereto on each clock period. Similarly, each of the vector registers may be selectively connected for receiving results. In a typical operation, two vector registers are employed to provide operands to a functional unit and a third vector register is employed to receive the results from the functional unit.
Further vector type machines are described in U.S. Pat. No. 4,661,900, issued Apr. 28, 1987 wherein multiple processor are each connected to a central memory through a plurality of memory reference ports. The processors are further each connected to a plurality of shared registers which may be directly addressed by either processor at rates commensurate with intraprocessor operation. A vector register design provides each register with at least two independently addressable memories, to deliver data to or accept data from a fimctional unit.
Many times, computer programs which are generating requests to perform operations on an array of data in a vector register will have a condition which must be true for any given element prior to performing the operation. In this case, while all the operands may be present in the vector registers, the processor may or may not perform the operation for each pair of operands from the vector registers. To make such processing more efficient, vector masks, which comprise a bit position for each of the elements in the operand vectors are used to determine if an operation is performed on a corresponding operand pair from the operand registers. If the value in the mask is a one, the operation is performed on the corresponding pair. If the value in the mask at a given bit position is a zero, a no-operation, commonly referred to as a no-op is performed. Essentially, the processor does nothing for one processor cycle.
The no-op consumes valuable processor cycles and can lead to severe performance degradation by actually performing operations and then canceling the results so that the machine state is not altered. Vector density is defined as the number of real operations to be performed on data in a vector register or multiple vector registers. Those vectors having a low vector density result in many no-ops being performed and are inefficient. Those vectors having high vector density are more efficient because less time is proportionately spent on no-ops. There is a need to bring the total execution time for vectors in line with the number of real operations to be performed. A directly proportional relationship between vector density and execution time is desired. There is a need to skip no-ops for zero values in the vector mask to improve performance time without greatly increasing the number of circuits required to skip no-ops.