Software “test and branch” instruction execution is a fundamental aspect of compiling architectures that modern information handling system (IHS) architectures employ. In conventional software code instruction sequences, execution of a conditional branch instruction results in a branch to one of multiple code paths dependent on the analysis of a specified conditional event. Moreover, the processor that executes the code sequence must typically determine the conditional event prior to the execution of the branch sequence. A simplified example of conditional branching is the decision branch. In the typical decision branch, process flow stops at point in a program and the processor makes a decision which way to proceed among multiple code branches dependent on a test result. An “if-else” statement is a common example of such a decision branch. Once the processor takes the proper branch based on the test result, flow resumes and the processor continues executing the instructions in the now selected branch.
The pseudocode in TABLE 1 below represents a conventional conditional test and branch sequence. Pseudocode is not a direct input to a processing system, but rather is a language that programmers and non-programmers often use to first develop a more readable version of program code under development. Typically, an agent interprets or transforms pseudocode into the proper syntax of a machine dependent computer language before the processor executes the code.
TABLE 1if condition    do thiselse    do that
Before branch sequences execute conditionally, the code may translate to machine language instructions such as seen in TABLE 2 below.
TABLE 2   branch if condition to label 1   do that   branch to label 2label 1:   do thislabel 2:   processing continues with code following if-else   statement
As seen in TABLE 2, the machine level code is more complex than the original branching pseudocode. When a software program must first evaluate a condition prior to continuing, this may result in significant data flow delays. Such delays are particularly evident in a scalar processing environment while working directly on discrete terms. Scalar operations operate on integers and real argument types, but not directly on vectors or arrays. A vector is a one dimensional array of variables or data. Other techniques, such as predictive methodologies, can reduce some aspects of branching inefficiencies. However, prediction methods exhibit their own inherent efficiencies related to misprediction and poor data event scheduling. Moreover, in pipelined systems, look-ahead operations may disrupt program flow when misprediction events occur.
Conventional processor systems may employ branch predication to manage branch sequences in program code. Branch predication provides a methodology of conditionally branching program code based on a predefined predicate. Predicate logic replaces conditional test and branch sequences with predicated sequences. Predicated branch sequence execution provides an increase in efficiency when the program code uses short branch lengths. In a pipelined system, the processor may execute both branch paths in advance of executing a conditional branch. As the processor catches up and determines which path is accurate, the processor may discard one path by using predication methodologies or specialized look-ahead processing. Using the previous pseudocode example, the simple branch sequence now converts to the predicated example in TABLE 3 which illustrates branch predication.
TABLE 3(condition) do this(not condition) do that
The elimination of the specific branches by such branch predication desirably results in less code. However, if the “do this” and “do that” blocks of code are long themselves, i.e. correspond to long code paths, then this branch predication technique may also become inefficient. Branch predication is combinable with branch prediction techniques wherein register information helps predict the most likely branch path. Branch prediction methodologies can be complex. Moreover, branch prediction is prone to misprediction events which result in large resource inefficiencies and re-processing overhead. Additionally, sequences that predicated execution generates are not properly vectorizable for use in a SIMD (single instruction multiple data) environment. In an object oriented environment, the environment defines vectors as a single object. Each vector associates with functions that can operate specific to that object or vector. Because branch-based sequences are inherently scalar in nature, eliminating branch sequences may allow conversion of the code to a vector-based code. Such a vector-based code is more easily convertible to SIMD instruction-based sequences.
SIMD-based code is readily usable in multi-core processor systems such as those that include synergistic processor units (SPUs). Multi-core processor systems provide an excellent environment for parallel processing of complex software code. Moreover, multi-core systems also provide an environment for managing vectors more efficiently. However, even a parallel SIMD environment first converts vectors to scalar data when using conditional test and branch sequences. The SIMD environment unpacks the vectors, operates on the unpacked vectors, and then repacks the vectors before flow continues. In another limitation of the conventional SIMD environment, the environment may not easily adapt predicated sequences to data parallel operations. Predication inhibits the architectural execution of an entire instruction in a data parallel environment. Thus, predicated code is not easily vectorizable for use in a data parallel system.
In yet another aspect of managing branch sequences in conventional processor systems, a processor system may employ data parallel select execution methodology. Data parallel select execution provides for two data inputs and a select control input. A register file stores the data associated with the 2 data inputs and the control input. Data parallel select execution independently selects one of the two data inputs for each vector slot under the control of the select control input. The select control input effectively acts as input for the selection of the proper coded sequence. Using data parallel select methodology to compute the result of conditional program flow integrates conditional operation into SIMD-based computation by eliminating the need to convert between scalar and vector representation. The resulting vectorized code thus contains conditional expressions, which in turn lets a processor core or SPU execute conditional execution sequences in parallel.
In summary, conditional branch sequences are not well suited for pipelined or data parallel processor systems. Conditional branches often cause data misprediction events and disruption of pipelined flow. Predicated executions are limited to processor systems exhibiting full predication capability. Moreover, predicated executions exhibit the limitation that they require scalar processing. In addition, predicated executions are inherently inefficient when the processor encounters long branch execution paths.
What is needed is a method of translating conditional test and branch operations into data parallel select operations that addresses the problems above.