Two types of processor architectures are widely recognized in the field of computer science: “scalar” and “vector”. A scalar processor is designed to execute instructions that perform operations on a single set of data, whereas, a vector processor is designed to execute instructions that perform operations on multiple sets of data. FIGS. 1A and 1B present a comparative example that demonstrates the basic difference between a scalar processor and a vector processor.
FIG. 1A shows an example of a scalar AND instruction in which a single operand set, A and B, are ANDed together to produce a singular (or “scalar”) result C (i.e., AB=C). By contrast, FIG. 1B shows an example of a vector AND instruction in which two operand sets, A/B and D/E, are respectively ANDed together in parallel to simultaneously produce a vector result C, F (i.e., AB=C and DE=F).
As is well known in the art, typically, both input operands and output result are stored in dedicated registers. For example, many instructions will have two input operands. Therefore two distinct input registers will be used to temporarily store the respective input operands. Moreover, these same instructions will produce an output value which will be temporarily stored in a third (result) register. Respective input 101a,b and 102a,b and result registers 103a,b are observed in FIGS. 1A and 1B. Notably, the “scalar” vs. “vector” characterizations are readily discernable.
That is, input registers 101a and 102a of the scalar design of FIG. 1A are observed holding only scalar values (A and B, respectively). Likewise, the result register 103a of the scalar design of FIG. 1A is also observed holding only a scalar value (C). By contrast, the input registers 101b and 102b of the vector system of FIG. 1B are observed holding vectors (A,D in register 101b and B,E in register 102b). Likewise, the result register 103b of the vector system of FIG. 1B is also observed holding a vector value (C,F). As a matter of terminology, the contents of each of the registers 101b, 102b and 103b of the vector system of FIG. 1B can be globally referred to as a “vector”, and, each of the individual scalar values within the vector can be referred to as an “element”. Thus, for example, register 101b is observed to be storing “vector” A, D which is composed of “element” A and “element” D.
Some computer systems, regardless if the underlying processor is of scalar or vector design, effectively require a logical operation across elements of a single vector. In the case of, for example, an eight input AND operation (the logical diagram of which is shown in FIG. 2A), eight separate inputs (A, B, C, D, E, F, G, H) are ANDed together to produce a final scalar result (R). In the case of scalar processors, loop operations have to be written in software that accumulate the result over seven iterations of a scalar AND instruction (the pseudo-code for which is shown in FIG. 2B). Thus, in the case of a scalar processor, the multiple iterations require multiple executions of the scalar AND instruction in order to perform the calculation.
By contrast, a vector processor can entertain the prospect of implementing such an operation with the execution of a single instruction designed to perform the logical operation outright.