1. Technical Field
The present invention relates to static program analysis and, more particularly, to training classifiers for static program analysis.
2. Description of the Related Art
A significant limitation of static analysis tools is their inherent imprecision. This is due to the fact that the analysis operates on an abstract representation of the concrete program to avoid from state-space explosion. If a concrete representation of a program were used, analysis would need to consider every possible state of the program, which results in a potentially infinite state space. Using abstract representations makes the analysis tractable, but creates the risk of finding false positives when searching for potential vulnerabilities.
In one example, a class of JavaScript vulnerabilities was discovered that were due to a weakness in infrastructure code released by a major browser provider. Web developers were instructed to copy the vulnerable code into their web applications. Although the code is vulnerable in principles, the runtime configuration in which it can actually be exploited is so rare that, for all practical purposes, it was deemed safe.
However, this source of false positives is highly prevalent in the wild, as many web applications indeed incorporate the relevant infrastructure code. As a result, static analysis programs report a prohibitive number of false positives on virtually every JavaScript application.