Many program analysis tools operate on a control flow graph (CFG) of procedures of a program. For example, CFGs can reveal optimization opportunities or programming errors. A control flow graph includes nodes and directed edges. The nodes represent fundamental executable elements of the procedure, for example, basic blocks. The directed edges usually represent non-linear execution flows between the elements due to instructions such as branches and jumps.
An exact CFG which accurately represents a procedure is hard to generate when the procedure includes computed jumps. A computed jump typically is some type of branch instruction whose target address is not known until the program executes. Typically, the target address is the result of some computation that may be highly dependent on a dynamic state of the machine. Such jumps arise frequently, e.g, in the implementation of switch or case statements in a programming language.
Generally, the CFG for a procedure is built by first identifying the basic blocks of the procedure. A basic block is a linear sequence of instructions such that all branches into the basic block go to the first instruction, and only the last instruction branches out of the basic block. Typically, a basic block will terminate with a branch instruction that can direct the linear execution flow to some new target address from which the linear fetching continues.
In most modern processors such as the Digital Alpha processor, two kinds of computed branches are supported: direct and indirect. The destination of a direct branch is trivial to compute: an offset encoded in the instruction is added to the address of the branch instruction, see for example, U.S. Pat. No. 5,428,786 "Branch execution via backward symbolic execution" issued to Sites on Jun. 27, 1995.
An indirect computed branch jumps to an address stored in a processor register. This address was computed in previously executed instructions by using an index value to read the address out of a jump table that stores destination addresses for each possible index value. Alternatively, the destination was computed by adding a small multiple of an index value to a base address. This execution flow is more difficult to unravel. Particularly, when the range of index values is only known with certainty at run-time, and the location (base address) and structure of the jump table is also unknown.
Prior approaches can work on source programs when the possible destinations of a computed jump are made apparent by the structure of the language or program annotations. Object code can be analyzed in the case where the compiler can emit additional information. Many compilers do not provide the additional information.
Therefore, it is desired to provide a method for determining target addresses of computed jumps in executable code. Then, the method is totally independent of the technique used to generate the code.