A structural analyser is a computer implemented software tool for analysing the control flow of an executable program to be executed on a target processor, with the aim of checking whether the program will execute as expected and if not to make appropriate modifications. Before describing the operation of a structural analyser, it may be useful to introduce a few concepts.
A basic block is a sequence of statements, in this case processor instructions, that are executed in order from the first statement to the last statement in the block. A basic block cannot contain any branch instructions until the last instruction of the block, and therefore execution is never transferred to another basic block until its last instruction has been executed.
A control-flow graph is a data structure used in static analysis to represent possible flows of execution through a program. A control flow graph G=(N, E) has nodes nεN representing basic blocks and directed edges (s, d)εE representing transfer of execution from the source node s to the destination node d. Algorithms for identifying basic blocks and generating control flow graphs are known in the art.
A simple example is illustrated in FIG. 1, which shows a control flow graph where each node labeled 1 to 5 represents a basic block, and the arrowed edges represent possible program flow between nodes.
In a control-flow graph, if every path of execution to some node n must pass through another node d, then d is said to dominate n. Also, every node dominates itself. There are efficient linear-time algorithms for computing dominance, also known in the art.
The phrase “Depth First Search Post Order” refers to an algorithm for searching a graph whereby, beginning from the root of the graph, the algorithm explores as far as possible along each branch before backtracking. In order to make the search “post order”, the nodes of the graph are listed in the order that they were last visited by the algorithm.
Structural analysis is a known technique that has the goal of identifying control flow structure. Typically the structural analyser begins by taking a binary executable program file, decoding the machine code instructions of the executable program, and generating a control flow graph based on the decoded instructions. The control flow graph comprises a plurality of low-level nodes each representing a basic block of machine code instructions, and also comprising a plurality of directional edges representing program flow between the nodes. To perform the structural analysis, the structural analyser then reduces the lower-level (more detailed) control flow graph to a higher-level (more abstracted) structural representation comprising higher-level structure nodes, each higher-level structure node having internal structure (i.e. each representing one or more lower-level nodes and one or more edges).
Typically, a predetermined set of higher-level structure node types appropriate for the language are chosen to be matched against a control-flow graph. The nodes of the control flow graph are traversed in depth first search post order. During the traversal, if a structure node pattern can be matched to the graph then the matching nodes are removed and replaced with a structure node. For example, consider the control-flow graph in FIG. 1 and a structural analysis algorithm that has just two structure node types: one for matching if-then-else patterns (Sa) and one for matching sequences of basic blocks (Sb). These are illustrated schematically in FIG. 2.
A pattern matching hierarchy is required if the algorithm is to be deterministic. For this example, let the if-then-else node have preference over the sequential pair node. The pattern matcher first tries to match the pattern at the top of the pattern hierarchy on every node in order, then it tries the next pattern and so on. If a match is made then the pattern matcher resets and the whole process is repeated. This continues until there is only a single node.
Referring to the example of FIG. 1, this would then lead to the algorithm behaving as follows. FIG. 3 illustrates the replacements made on the control-flow graph during the execution of the algorithm.                (i) Perform a depth first post order search on the graph to determine the order that the nodes will be traversed in. In this case the resulting order of the nodes would be: 5, 4, 3, 2, 1.        (ii) Try to match the if-then-else type structure node pattern starting from node 5. No match is found.        (iii) Continue to try to match the if-then-else node pattern starting from node 4, 3, 2 and then 1. No match is found until node 1 at which the pattern is found to match.        (iv) Replace the covered nodes with a new, higher-level structure node, node 6, and connect it to the edges of the removed nodes.        (v) Repeat from the start trying to match the if-then-else pattern again, this time the order of traversal is: 5, 4, 6.        (vi) When the if-then-else node has failed, try to match the sequential pair node by the same method.        (vii) Node 4 will make a match, consuming nodes 4 and 5 into a new structure node 7. The traversal order is now: 7, 6.        (viii) No further if-then-else matches are found, but Node 6 will make a match to a sequential pair, consuming nodes 6 and 7 to make a new, even higher level structure node 8.        
This gives an example of the steps a structural analysis algorithm would take to reduce a control-flow graph to a single structure node.
In some cases a control flow graph may contain irreducible regions which cannot be matched to any of the analyser's higher-level structure node patterns. A known solution to dealing with the problem of irreducible graphs is node splitting. This is the process of transforming an irreducible region into a reducible one by splitting some of the nodes. The aim of node splitting is that the graph after the transformation will still represent the same control-flow but is closer to matching one of the structure node patterns.