Dataflow analysis is a technique for gathering information about possible sets of values calculated at various points during the execution of a computer program. One current approach to dataflow analysis tracks information flow at the level of small intermediate execution steps, statements, program states, or any of various combinations thereof. These execution steps, statements, and program states may be regarded as blocks. Usually it is sufficient to analyze information flow at the boundaries of these blocks. Thus, each statement in a programming language may be associated with a dataflow equation that describes how data flows due to execution of the statement. As an example, given x=y+z, the equation would relate v as well as x to z.
The set of dataflow equations may be formulated by considering that an entry state of a block is a function of one or more respective exit states each associated with a corresponding predecessor block. However, in some situations, a given block may not be associated with any predecessor blocks, whereupon the entry state of such a block would generally be well defined at the start of the dataflow analysis procedure. In a forward flow analysis, the set of dataflow equations may be formulated by considering that the exit state of the block is a function of an entry state of the block.
For purposes of illustration, a control flowgraph (CFG) may be used to determine those parts of a program to which a particular value assigned to a variable might propagate. Dataflow equations are formulated for each of a plurality of nodes in the CFG. These equations are solved by repeatedly calculating a local node output from a local node input at each of the plurality of nodes until the entire CFG stabilizes and reaches a fixpoint. A fixpoint of a function is an element of the function's domain that is mapped to itself by the function. Thus, c is a fixed point of the function f(x) if and only if f(c)=c.
Dataflow analysis techniques may be utilized in conjunction with infomation flow security, typestate monitoring, loop/code parallelization, just-in-time compilation optimization, or any of various combinations thereof. In the context of information flow security, a respective security level is assigned to each of a plurality of corresponding variables. A basic model of flow security may comprise two distinct levels: low for public observable information, and high for secret information. To ensure confidentiality, information flowing from high-level to low-level variables should not be allowed. On the other hand, to ensure integrity, flows from low-level to high-level variables should be restricted. More generally, security levels can be viewed as a lattice where dataflow analysis should indicate that information is flowing only in an upward direction through the lattice.
Typestate monitoring may be implemented in tandem with dataflow analysis. Typestate monitoring reflects how legal operations on one or more imperative objects can change at program runtime as the internal state of these objects changes. A typestate checker can statically ensure, for instance, that an object method is only called when the object is in a state for which the operation is defined.
Dataflow analysis may be used to provide loop/code parallelization. Loop/code parallelization refers to a conversion of sequential code into multi-threaded code, vectorized code, or both, in order to enable multiple processors to be used simultaneously in a shared-memory multiprocessor (SMP) machine. Loops represent a programming control structure which is strongly emphasized in the parallelization process. In general, a majority of the execution time of a program takes place when the program is executing instructions that are within a loop.
Just-in-time (JIT) compilation may be performed using dataflow analysis techniques. JIT compilation, also known as dynamic translation, is compilation that is performed during execution of a program—at run time—rather than prior to execution. The compilation process may include a translation of the program into another format such as machine code, which is then executed directly. JIT compilation is a combination of two traditional approaches to translation to machine code—ahead-of-time compilation (AOT), and interpretation—and combines some advantages and drawbacks of both. JIT compilation combines the speed of compiled code with the flexibility of interpretation, with the overhead of an interpreter and the additional overhead of compiling (not just interpreting). In theory, JIT compilation may be able to provide faster execution times than static compilation, but existing dataflow analysis techniques have not enabled JIT compilation to reach its full potential.
Conventional dataflow analysis approaches have several limitations. In terms of overhead, the need to track dataflow through all intermediate program states and statements leads to severe performance bottlenecks, sometimes on the order of several hundred times. These bottlenecks create usability problems. Likewise, in certain cases, a program under test ceases to behave correctly. Such behavioral issues may arise in situations where the program under test uses timers or timed events. Moreover, many applications are written in multiple languages. For example, mobile applications for the Android™ operating system are often written in a combination of Java™, JavaScript™, and native code. Tracking dataflow across language boundaries is very difficult. No existing solution is equipped to handle the commonly occurring situation of transitioning between managed and unmanaged code.
Yet another issue that stems from local tracking of dataflow is a loss of precision. In some cases, a dataflow analysis reaches a conservative but inaccurate conclusion due to an overly myopic form of reasoning employed by the dataflow analysis procedure. An example is analysis of intermediate states within a linearizable method without accounting for atomicity guarantees. Atomicity refers to an indivisible and irreducible series of database operations such that either all of the operations occur, or nothing occurs. A guarantee of atomicity prevents updates to the database occurring only partially, as a partial update can cause more problems than simply rejecting the whole series of updates outright. One illustrative example of an atomic transaction is a monetary transfer from a first bank account to a second bank account. This transaction consists of two operations, withdrawing the money from the first account, and depositing the money into the second account. Performing these operations in an atomic transaction ensures that the database remains in a consistent state, such that money is not lost nor created if one of the two Operations fails
In view of the foregoing considerations, there exists a need to overcome at least one of the preceding deficiencies and limitations of the related art.