Field of the Invention
The present invention generally relates to program analysis. More particularly, an alternative, probabilistic method for program tracking initially performs a static analysis to determine a relatively few points at which profiling instructions are interweaved, from which points an actual data flow during an execution can be inferred, with a high probability.
Description of the Related Art
Dataflow tracking is a fundamental form of program analysis. There are multiple applications of high practical value for dataflow tracking, including, for example, runtime security analysis and/or enforcement, runtime privacy analysis and/or enforcement, runtime detection of concurrency bugs, speculative parallelization (e.g., in the form of software transactional memory), testing of refactoring transformations, etc. Common use cases include such practical applications as code parallelization, information-flow security, and typestate checking, to name a few examples. Dataflow tracking requires local monitoring at the level of intermediate states and atomic program statements. In practice, this leads to severe complications, including (i) native code, (ii) complex libraries and (iii) scalability.
Existing solutions to the scalability challenge, for example, TaintDroid, all turn in the direction of heavy engineering to achieve nontrivial and limited-value performance optimizations, which often come at the price of accuracy loss, such as overly conservative modeling of the dataflow relation. Native code and complex libraries are typically accounted for via hand-written summaries. These too require substantial time and effort to author, and are often approximate at best in representing the true dataflow behavior of their respective code.
Moreover, existing solutions for data-flow tracking are deterministic. They mandate the insertion of profiling instructions into the program at every code location to record and propagate flow of information. As such, they introduce significant performance slowdown (up to 700 times), have poor scalability, and are unable to handle various real-world scenarios, such as use of native code, where profiling instructions cannot be inserted and thus information-flow tracking is interrupted and the flow of interest is lost.
The present inventors have recognized that currently there is no satisfactory solution for dataflow tracking at the low level of individual statements. At the same time, there are no alternative approaches in existence, and so the current practice is to invest increasingly more manual effort—all ad hoc to target particular observed challenges—to enable practical applications of dataflow tracking, such as robust real-time security and privacy enforcement.