Two types of processor architectures are widely recognized in the field of computer science: “scalar” and “vector”. A scalar processor is designed to execute instructions that perform operations on a single set of data, whereas, a vector processor is designed to execute instructions that perform operations on multiple sets of data. FIGS. 1A and 1B present a comparative example that demonstrates the basic difference between a scalar processor and a vector processor.
FIG. 1A shows an example of a scalar AND instruction in which a single operand set, A and B, are ANDed together to produce a singular (or “scalar”) result C (i.e., AB=C). By contrast, FIG. 1B shows an example of a vector AND instruction in which two operand sets, A/B and D/E, are respectively ANDed together in parallel to simultaneously produce a vector result C, F (i.e., A.AND.B=C and D.AND.E=F).
As is well known in the art, typically, both input operands and output result are stored in dedicated registers. For example, many instructions will have two input operands. Therefore two distinct input registers will be used to temporarily store the respective input operands. Moreover, these same instructions will produce an output value which will be temporarily stored in a third (result) register. Respective input 101a,b and 102a,b and result registers 103a,b are observed in FIGS. 1A and 1B. Notably, the “scalar” vs. “vector” characterizations are readily discernible.
That is, input registers 101a and 102a of the scalar design of FIG. 1A are observed holding only scalar values (A and B, respectively). Likewise, the result register 103a of the scalar design of FIG. 1A is also observed holding only a scalar value (C). By contrast, the input registers 101b and 102b of the vector system of FIG. 1B are observed holding vectors (A,D in register 101b and B,E in register 102b). Likewise, the result register 103b of the vector system of FIG. 1B is also observed holding a vector value (C,F). As a matter of terminology, the contents of each of the registers 101b, 102b and 103b of the vector system of FIG. 1B can be globally referred to as a “vector”, and, each of the individual scalar values within the vector can be referred to as an “element”. Thus, for example, register 101b is observed to be storing “vector” A, D which is composed of “element” A and “element” D.
Only scalar or SIMD multiply operations are known to have been actually implemented in a semiconductor chip processor as a single processor instruction. Scalar or SIMD multiply instructions that are known to have been implemented in a semiconductor chip processor include the “multiply” instruction (MUL) which provides the lower ordered bits of the product of two integer input operands and the “multiply high” instruction (MULH) which provides the higher ordered bits of a scalar integer multiply operation.
Other instructions that are known to have been implemented in a semiconductor processor chip as scalar or SIMD instructions include the “count leading zeros” CLZ instruction, the “count trailing zeroes” instruction CTZ, and the “count” instruction CNT. The scalar CLZ instruction accepts a scalar input A and returns the number of 0s in A that precede the highest ordered 1 in A (e.g., if A=1000, the result of CLZ=0; if A=0100; the result of CLZ=1, if A=0010, the result of CLZ=2;, etc.). The scalar CTZ instruction accepts a scalar input A and returns the number of 0s in A that follow the lowest ordered 1 in A (e.g., if A=1000, the result of CTZ=3; if A=0100; the result of CTZ=2, if A=0010, the result of CTZ=1;, etc.). The scalar CNT instruction accepts a scalar input A and returns the number of 1s in A (e.g., if A=1011, the result of CLZ=3; if A=1001; the result of CLZ=2, if A=0010, the result of CLZ=1;, etc.).