In compiler design, Intermediate Representation (IR) is important for analyzing and optimizing the data and control flow of source code. There are commonly known IR forms such as Static Single Assignment (SSA) that source code can be converted into. Source code include but are not limited to programming statements written in a computer language or any form of statements readable by a compiler or machine. FIG. 1 shows an example of source code 100.
In SSA IR form, every definition or assignment of a variable in the source code is represented as an assignment of a distinct instance (or version) of the variable. FIG. 2 shows an example of the SSA IR form 200 of the source code 100. Detailed explanation of the conversion of the source code 100 to SSA IR form 200 is not undertaken herein as it is apparent to one of ordinary skill in the relevant art.
In FIG. 2, the SSA IR form 200 is separated into six basic blocks A to F. A basic block is a basic unit of code sequence in the IR form that always executes sequentially without change in control flow from the first instruction of the block to the last instruction of the block whenever the first instruction is executed. The variable X in line 1 of FIG. 1 is defined by five new versions in FIG. 2, namely, X0 in line 1, X1 in line 4, X2 in line 13, X3 in line 16, and X4 in line 24 of the SSA IR form 200. The definitions of X0, X1, X2, X3 and X4 are denoted by D0, D1, D2, D3, and D4 respectively. The control flow condition predicates of the conditional branch statements in line 6, line 7 and line 18 of FIG. 2 are denoted as P, Q and S respectively.
FIG. 3 shows a Control Flow Graph (CFG) 300 of the SSA IR form 200. Nodes A 310, B 312, C 314, D 316, E 318, and F 320, represent the basic blocks A to F in FIG. 2. Detailed explanation of the creation of the CFG 300 from the SSA IR form 200 is not undertaken herein as it is apparent to one of ordinary skill in the relevant art how the CFG is created. Paths 350, 352, 354, 356, 358, 360, 362, 364, and 366 show the data and control flow of the SSA IR form 200.
SSA IR form, although it can make several optimizations such as constant propagation, copy propagation and symbolic analysis more effective, only provides control-flow-insensitive data flow information. This is observed in lines 4, 16 and 24 of the SSA IR form 200 where phi (φ) functions introduce new versions of a variable to cover all possible versions reaching at the merge point of different control flow path. However, the phi (φ) function does not contain any control flow information. It does not show which version of the variable comes from a particular control flow path. In SSA IR form 200, the definitions of variable version X3 and X4 are given in D3 and D4 respectively asX3=φ (X1, X2) and X4= (X1, X3)The phi (φ) function in D3 shows that X3 is assigned from either X1 or X2 but it does not have control flow information on how X1 and X2 reach the D3.
Gated Single Assignment (GSA) and Thinned Gated Single Assignment (TGSA) are two prior proposed extensions to SSA IR form to provide additional control flow information. In GSA and TGSA forms, control flow condition predicates are integrated into the representation of SSA IR form. A gamma (γ) function is used instead of phi (φ) function in SSA IR form 200 to represent merging of different versions of variable X. In TGSA form, D3 and D4 define X3 and X4 respectively asX3=γ (P, X1, X2) and X4=γ (P, γ(Q, X1, X3), X3)In GSA form, D3 and D4 define X3 and X4 respectively asX3=γ (P, γ (Q, Ø, X1), X2) and X4=γ (P, γ (Q, X1, γ(S, X3, Ø)), γ(S, X3, Ø))The difference between GSA and TGSA form is that GSA form provides more accurate control flow information by showing all control predicates required to reach the current definition. A null (Ø) in the gamma (γ) function of GSA form shows a control flow path that does not reach the current definition.
GSA and TGSA do not provide explicit accurate path information important in control flow sensitive analysis. For example, in GSA form, definition D4 cannot provide the path information from definition of X1 (D1) to definition of X4 (D4) directly. An indirect way is to expand the pseudo assignment expressions of D4's gamma (γ) function. The expanded expression is given asX4=γ(P, γ(Q, X1, γ(S, γ(P, γ(Q, Ø, X1), X2), Ø)), γ(S, γ(P, γ(Q, Ø, X1), X2), Ø))This expanded expression has unnecessary and redundant path predicates.
In TGSA form, since some control path information is dropped in the gamma (γ) function, it does not support accurate path sensitive analysis. For example, X1 can only reach D3 when the condition predicate Q is false but the gamma (γ) function in D3 in TGSA form does not provide any information regarding the condition predicate Q.