Technical Field
The present invention relates to static security analysis and, more particularly, to representing source statements as hybrids of concrete and abstract representation.
Description of the Related Art
Static security analysis typically takes the form of taint analysis, where the analysis is parameterized by a set of security rules, each rule being a triple <Src,San,Snk>, where Src denotes source statements that read untrusted user inputs, San denotes downgrader statements that endorse untrusted data by validating and/or sanitizing it, and Snk denotes sink statements which perform security-sensitive operations. Given a security rule R, any flow from a source in SrcR to a sink in SnkR that doesn't pass through a downgrader from SanR comprises a potential vulnerability. This reduces security analysis to a graph reachability problem.
However, one source of imprecision in taint analysis comes from the fact that untrusted values are not represented explicitly. Instead, vulnerabilities are reported based on data flow extending from the source to the sink. This has been a reasonable compromise, given that tracking fully concrete values during static analysis of a program yields and unbounded state space. Thus the static verifier is not guaranteed to converge on a fixpoint solution in finite time.
To increase precision, a family of algorithms collectively known as “string analysis” has been developed. In these algorithms, string values are modeled either using regular representations or a context-free language or logical formulae (e.g., in monadic second-order logic). None of these approaches has been shown to scale beyond a few lines of code due to the inherent computational complexity of representing string variables in these forms.