This invention relates generally to analysis of program code and, more specifically, relates to static and dynamic analysis of program code.
This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application and is not admitted to be prior art by inclusion in this section.
Programs have become very complex and, with this complexity, have become vulnerable to attack or to errors. One way to prevent or reduce the occurrence of these vulnerabilities is by analyzing the program. Possible program analyses include the following: taint analysis, where “taint” is tracked from a source to some endpoint; buffer overflow analysis, which is useful in preventing buffer overflow attacks and includes checking that data written into a buffer does not exceed buffer size; and typestate analysis, which performs checking that correct use is made of an object given a current state of the object.
These types of program analysis may be performed dynamically or statically. Dynamic analysis is performed by executing the program and determining results based on the execution of the program. The program is typically modified, such as by instrumenting the program. Instrumenting the program refers to an ability, e.g., to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a program (for example, instructions may output logging information to appear on screen or may write trace information to a file).
Static analysis is an analysis that involves examining the code of programs such as Web programs without executing the code of the program. Some type of model is (or models are) created of the code of the program, to estimate what would happen when the code actually is executed.
Static security analysis typically takes the form of taint analysis, where the analysis is parameterized by a set of security rules, each rule being a triple <Src,San,Snk> denoting the following:
1) source statements (Src) reading untrusted user inputs;
2) downgrader statements (San) endorsing untrusted data by either validating or sanitizing the untrusted data; and
3) sink statements (Snk) performing security-sensitive operations.
There are a number of techniques for analyzing taint flow from sources to sinks. These techniques also consider whether flow passed through a downgrader (also called an endorser) that performs downgrading of the taint. One set of techniques includes graphs such as call graphs. Call graphs are directed graphs that represent calling relationships between methods in a computer program.
Using such techniques, given security rule r, a flow from a source in Srcr to a sink in Snkr that does not pass through a downgrader from Sanr comprises a potential vulnerability. This reduces security analysis to a graph reachability problem.
For small programs, the graph reachability problem is manageable. However, for large programs such as those used in many Web-based environments, the programs can contain thousands or hundreds of thousands of lines of code. As programs get larger, the graphs of those programs tend to increase very quickly in size.
Similar problems can occur with buffer overflow analysis and typestate analysis.