Programs typically include many Boolean-valued expressions. Values produced by these expressions have a number of uses, including as data values, as branch conditions, and, in some computers, as predicates for conditional execution of operations.
A common application for Boolean-valued expressions is to perform a Boolean reduction. That is, to reduce a plurality of values to a single value using Boolean functions, e.g. AND and OR. An example of such an expression for performing a Boolean reduction is as follows: EQU r=uvwx (1)
The above expression reduces Boolean values u, the complement of v (represented by v), w, and x to a single Boolean value r using the AND function (represented by ). In general, any Boolean expression can be expressed in the sum of products form, consisting of a number of AND-reductions followed by an OR-reduction.
Typically, the Boolean values used in programs result from comparisons or relational operations such as less than (&lt;) and greater than (&gt;). Accordingly, the following Boolean expression is more typical of AND-reduction expressions encountered in computer programs. EQU r=(a&lt;b)(c&gt;d)(a&lt;c)(b&gt;d) (2)
The speed with which reduction expressions of this type can be evaluated is extremely important to efficient program execution. This is especially true in the context of computer system architectures utilizing instruction-level parallelism, such as very large instruction word (VLIW) or superscalar architectures. Boolean expressions typically are used in computing branch conditions. Thus, evaluation of Boolean expressions is frequently in the critical path of a program. Also, some program transformations for exposing parallelism (i.e. operations that can be executed simultaneously) introduce Boolean reductions. The effectiveness of such transformations depends on how fast these expressions can be evaluated. Some examples of such transformations include combining of multiple branches that exit out of an unrolled loop into a single branch, and height-reduction of control dependence in an unrolled while loop.
In computers that lack the capability to execute operations in parallel, Boolean expressions are generally evaluated in a serial fashion. That is, each operation in the reduction expression is executed one at a time. In general, a purely sequential evaluation of a reduction expression containing n compare operations takes (2n-1).alpha. machine cycles, where a is the number of cycles per compare or AND operation on the computer system. For example, the reduction expression (2) given above would take 7a cycles.
In the context of architectures with instruction-level parallelism, compile-time height-reduction techniques are often used to accelerate the evaluation of Boolean expressions. A simple technique for accelerating evaluations is to issue the compare operations in parallel, then perform the reduction using a binary tree of AND operations. Consider, for example, the expression (2) given above. Assuming the processor in the computer system has at least four functional units, the expression can be evaluated as shown in the following table:
TABLE 1 ______________________________________ Example of Height Reduction Technique. Cycle Instruction ______________________________________ ##STR1## .alpha. ##STR2## 2.alpha. ##STR3## 3.alpha. the value, r, is available for use. ______________________________________
In the example of Table 1, expression (2) is evaluated with a simple height reduction technique in only 3.alpha. cycles. In general, with this height-reduction technique, a reduction expression containing n compare operations can be evaluated in (1+(log.sub.2 n)) .alpha. cycles provided the computer system can execute n operations in parallel. However, even when resources providing additional parallelism are available, this technique still requires at least (1+(log.sub.2 n)).alpha. cycles.
The present invention provides a mechanism and technique for evaluating any reduction expression in effectively only .alpha. cycles, provided the computer system has sufficient parallelism. Performance with this method and technique is limited only by the number of functional units (resources) which can execute operations in parallel, and not by any dependencies between individual operations in the expression. Further, in accordance with the invention, the operations can be executed simultaneously or in any desired order. Thus, for example, a compiler is free to overlap the execution of compare operations with other operations in the program, which is important for computer systems with limited resources.
The invention has two primary aspects. According to a first aspect of the invention, one or more registers in a computer system permit multiple operations to simultaneously write a value into that register provided all values written by those operations are identical. In that case, the result stored in that register is well-defined, and equals any one of the values being written to the register. If, however, multiple operations simultaneously write different values into one register, the resulting stored value is undefined. The values written can be Boolean, integer, floating-point, or other values. The register can be a 1-bit register or a bit location in a condition or status register. The register also can be a general-purpose or floating-point register. Not all registers in the computer system need provide this capability for multiple simultaneous writes.
According to a second aspect of the invention, a computer system provides a set of reduction operations. The execution of each reduction operation is, in general, defined by two functions of the operation's input values, a result function and an enable function. The result function determines what value or values, if any, are stored by the operation. The enable function determines whether or not those values are written or stored into a target location or register. Accordingly, the result r.sub.out of a reduction operation can be expressed as: EQU r.sub.out =F.sub.out (r.sub.in 1, r.sub.in 2, . . . , r.sub.in n) EQU if F.sub.en (r.sub.in 1, r.sub.in 2, . . . r.sub.in n) (3)
where F.sub.out is the result function, and F.sub.en is the enable function. The result and enable functions can include at least the following Boolean-valued functions: comparisons of integer or floating-point input values, and functions of Boolean inputs values, such as AND, OR, inverse, and identity functions. The specified target register preferably is of the type that handles multiple simultaneous writes.
In a computer system according to these two aspects of the invention, any reduction expression can be evaluated in effectively only .alpha. cycles. Further, any general Boolean expression, expressed in sum of products form, can be evaluated in effectively 2.alpha. cycles. The reduction expression (2) above, for example, can be evaluated by simultaneously performing four operations which conditionally write, dependent on the result of a comparison, a Boolean zero value into a register which is preset to one as illustrated in the following table.
TABLE 2 ______________________________________ Example of Evaluating Expression (2) According to the Invention. Cycles Instruction ______________________________________ r = 1; (overlapped with previous operations) 0 r = AND -&lt; (a,b); r = AND.sub.c -&gt; (c,d); r = AND -&lt; (a,c); and r = AND -&gt; (b,d). .alpha. The value in r is available for use. ______________________________________
In the above table, the register r is preset to one. The operations designated AND conditionally write a Boolean zero value into the register r if the result of the specified comparison operation (&lt; or &gt;) is a zero. The operation designated AND.sub.c writes a Boolean zero value into the register r if the complement of the result of the specified comparison operation is a zero. Thus, although more than one of the operations may simultaneously write to the register, all the values that might be written are a zero.
The four simultaneously executed operations effectively evaluate reduction expression (2) in .alpha. cycles. (This assumes the operation to preset the register can be overlapped with previous operations, such as, for example, during cycles when one of the functional units would otherwise remain idle. Thus, the preset operation takes effectively no additional execution time.) If the result of any of the comparisons performed by the AND operations, or the complement of the comparison performed by the AND.sub.c operation, is a zero, the register will be set to a zero. As the register r conforms to the first aspect of the invention, multiple operations concurrently setting the register to zero will result in the register being set to a defined value of zero. The register otherwise remains set to one.
Additionally, any general Boolean expression, expressed in sum of products form, can be evaluated in 2.alpha. cycles. (Again, this assumes the preset operations are overlapped with previous operations so as not to require any additional cycles.) In the first a cycles, each AND term of the expression is performed as a separate AND reduction. In the next .alpha. cycles, an OR reduction is performed on the results of the AND reductions to obtain the value of the expression.
The values of the Boolean expressions are particularly useful for predicated execution. Predicated execution of instructions refers to execution which is conditioned on the value of an input, usually a boolean value. For example, on a machine which supports predicated execution, an instruction which adds two input values may be conditioned on whether a third, predicate input has a certain value. The adding of the two input values, or at least the writing of the result of the add function, takes place only when the predicate input has the particular value. Thus, according to a further feature and advantage of the invention, instructions can be predicated on reduction expressions evaluated according to the invention.
Additional features and advantages of the invention will be made apparent from the following detailed description of a preferred embodiment which proceeds with reference to the accompanying drawings.