One of the issues encountered in computing system design is dealing with computer program branching and its impact on processing speed. Computer programs typically include a number of “branch” instructions which cause program execution to transfer to an alternate area of instructions some distance in memory from the branch instruction. In many cases these branch instructions are “conditional”—they only occur if a particular condition is satisfied (e.g., a specified register or bit has a zero value). Thus it is unknown until the program is executed whether the conditional branch instruction will actually be executed (requiring a jump to the new set of instructions) or not (allowing further execution of the current set of instructions). This uncertainty has implications for processing throughput, as contemporary processors attempt to cache and “pre-process” instructions (as part of an instruction “pipeline”) prior to the time for execution, in order to overcome issues such as memory delays and instruction decoding delays. If the conditional branch is taken, the cache and pre-processing may be no longer valid, requiring a pipeline “flush” and reload, and thus creating additional processing delay.
In order to overcome the uncertainties posed by conditional branching in pre-processing, one technique used in contemporary processor designs allows for the conditional execution of instructions, typically through multiple execution pipelines within the processor (and in some cases using branch prediction algorithms to make judgments about likely branch paths). Each pipeline will pre-process a potential conditional branching situation, and the processor will only execute the pipeline that includes the instructions that will actually need to be executed based on the outcome of the conditional branch. Another technique uses “conditional execution” instructions, which are instructions that are only executed when the specified condition is true (the condition is said to “guard” the instruction from execution). Conditional execution instructions can thus be used to reduce the number of branches needed in a section of software. Several processor architectures support conditional execution instructions (e.g., ARM processors, Motorola MCORE processors).
To accommodate conditional execution, software compilers need to be configured to generate machine instructions that take advantage of the efficiencies of conditional execution. Current compilers attempt to perform internal tree optimizations on the processed high-level source code (for example, C or C++ source code). Although tree optimizations can improve execution speed, tree optimizers lack intimate information concerning the block size and number of machine instruction groupings generated by the source code when compiled. Without such information, situations that are appropriate for optimization (and situations that are not appropriate for optimization) can be missed, reducing overall system throughput and memory efficiency.