The present invention relates to determining destinations of a dynamic branch in compiled computer code (i.e., machine code). More particularly, the present invention relates to determining every possible destination of such a dynamic branch so that the control flow of such computer code with respect to such dynamic branch may be accurately determined.
In certain circumstances, it is useful to translate compiled machine code from a first state corresponding to a first instruction set to a second state corresponding to a second instruction set. As is known, such translating is preferable to the alternative: locating the source code from which the machine code in the first code state was derived; and writing and debugging a compiler to compile such located source code directly into the machine code in the second code state.
As is also known, such translating from the first code state to the second code state is performed by way of an appropriate re-compiler or translator on an appropriate processor with attached memory. However, a problem arises in the situation where the machine code in the first state includes branch or destination address information, where such branch address information would be different for the machine code in the second state, and where explicit address information from the original compiler is absent. The problem is exacerbated when branch address information does not translate in a fairly direct manner.
As is known, in computing in general, a dynamic branch is a transfer from one part of computer code to one of several other parts of computer code based on a potentially complex calculation of a branch or destination address. A translator must accurately determine all possible potential destinations of a dynamic branch in the absence of explicit information from the original compiler so that to control flow with respect to such dynamic branch is accurately translated.
In order to translate code from one instruction set to another, it is necessary to identify the destination of every branch. This is a prerequisite to the generation of a complete control flow graph (CFG) for the code. As should be understood, such CFG is needed for full semantic analysis of the code, and, more narrowly, to allow for adjustment to the branches or their arguments to changed addresses or new address forms in the target (i.e., second) instruction set. For a static branch (i.e., a branch to only one destination), whether code segment based or self-relative, this is a straightforward task, as the address of the destination is encoded in the branch instruction itself. For a dynamic branch, however, the task becomes more complex.
Dynamic branches rely, in whole or in part, on results of calculations made prior to the branch, and thus present a more difficult problem for translation. A dynamic branch naturally implicates a set of potential destination addresses, rather than a single destination. A need exists, then, for a translator that evaluates the calculations that preface the branch, in order to adjust for relocated potential destinations or different addressing structure in the target instruction set.
The present invention satisfies the aforementioned needs by providing a method, a translator, and a computer-readable medium for translating compiled programming code from a first code state to a second code state. The programming code in the first code state has a plurality of basic blocks, where each basic block has a set of instructions. At least one basic block ends in a dynamic branch, the dynamic branch being a transfer to one of a set of destinations based on a calculation of a destination address.
The plurality of basic blocks in the first code state of the programming code are identified, as are links between the identified basic blocks. A control flow graph/representation (CFG) of the programming code is then constructed based on the identified basic blocks and identified links, where the CFG is in a preliminary form. At least one basic block ending in a dynamic branch is identified, and all identified basic blocks that lead to the dynamic branch are explored, based on the CFG, as far back as is necessary to fully determine a set of destination addresses for the dynamic branch.
The set of destination addresses defines the set of destinations from the dynamic branch. Such set of destinations is examined to identify a branch table, and the CFG is updated to reflect the set of destinations and the identified branch table. The programming code is then translated from the first code state to the second code state based at least in part on the updated CFG.