This invention relates generally to data processing, and more particularly, to a data processing system having optimized branch control and method thereof.
Lower power design techniques have been gaining importance in microprocessor and microcontroller design due to the widespread use of portable and hand held applications. Such applications require long battery life and low system cost. A portable application typically operates alternatively between two operating modes: (i) burst mode, where active computations are performed; and (ii) power-down mode (or sleep mode), where the system is asleep waiting for a new computational event to occur. If a subsystem (microprocessor included) consumes only a small fraction of the overall system power, then low cost and high performance should be the design goals for the subsystem.
Branches have long been recognized as a major factor in degrading the performance of a pipelined machine. This is due to the fact that branches break the continuous flow of the instruction stream. Also, branches often can only be resolved deep into the execution pipeline. Techniques such as branch prediction and speculative execution are widely employed to reduce the adverse effect of branches. These techniques, unfortunately, often call for hardware intensive implementations. Other alternative low cost approaches are needed to improve the performance on branches.
Prior methods used to optimize branch paths taken by microprocessors have been used. One prior method of optimizing the execution of branch instructions is known as software loop unrolling. Software loop unrolling occurs at compile time when it is known that a specific loop will be executed many times. Software loop unrolling duplicates the code within the loop and reduces the number of iterations through the loop by a factor equal to the number of times the code has been duplicated. For example, if it is known at compile time that a specific software loop will be executed 100 times, it will be possible to place two copies of the code contained within the unrolled loop, and execute the actual branch only 50 times. However, such a technique, while potentially saving cycles per iteration, creates a larger static program size.
Another known prior art technique to optimize branching has been to use a special loop instruction. Prior art special loop instructions were designed such that a branch and a decrement (or increment) of the loop counter occur within a single instruction. As a result, one clock cycle per iteration is saved. However, each special loop instructions requires a unique instruction opcode. Therefore, while the use of special branch instructions can save a clock cycle per iteration, it is accomplished at the cost of a larger instruction set.
Yet another prior art method incorporates a loop mode along with special loop instructions. Loop mode is an instruction which indicates that the preceding instruction is to be repeated a specified number of times. Such an implementation avoids having to fetch the instruction which is to be executed numerous times sequentially. However, this technique requires the existence of the special loop instructions, and limits the loop body to a single instruction.
Therefore, a more versatile method of branching which minimizes the number of clock cycles needed to execute program loops would be desirable.