The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Vulnerabilities in an application program may be introduced by untrusted data flowing through the application program from an input to an output without the application program performing sufficient actions to prevent potential cyber-security attacks from occurring. For example, an application program may use a uniform resource locator (URL) to receive data that the application program subsequently outputs as data accessed by a web browser, but a web browser displaying a web page based on untrusted data may enable an attacker to gain elevated access-privileges to sensitive web page content. An input for untrusted data is referred to as a taint source, and the output for untrusted data is referred to as a taint sink. Static analysis that tests the security of an application program for vulnerabilities may produce a high rate of false positive results. Dynamic analysis has gained popularity due to the production of fewer false positive results. Therefore, if dynamic analysis identifies a vulnerability in an application program, the vulnerability is more likely to be an actual vulnerability, thereby justifying the expense of sufficient resources in an attempt to analyze the application program's associated dataflow to correct the vulnerability. Dynamic analysis typically identifies application program vulnerabilities such as cross site scripting (XSS) and SQL injection (SQLi).
While dynamic analysis is able to identify application program vulnerabilities by their data sources and data sinks, dynamic analysis does not keep track of the complete flow of tainted data because of problems with recording any information in between the data source and the data sink. Without the information of how data flows in an application program, correcting vulnerabilities is difficult because identifying only a data source and a data sink does not provide any clear indication of the nature of any vulnerabilities between the data source and the data sink. Such difficulties become greater for large application programs, where manually searching source code to review the detailed data flow of possible vulnerabilities is extremely time consuming, and manually identifying the detailed data flow of actual vulnerabilities in the source code is nearly impossible.
Additionally, an application program may have multiple possible paths from the same data source to the same data sink. Since dynamic analysis can identify the same vulnerability many times, dynamic analysis typically executes a de-duplication process based on the information describing the vulnerability. Since only the data sources and data sinks are identified, the de-duplication process can mistakenly identify multiple different vulnerabilities that share the same data source and the same data sink as only a single vulnerability. Therefore, identifying all of the actual vulnerabilities becomes more difficult without identifying the specific dataflow between a data source and a data sink. Accordingly, it is desirable to provide techniques that enable a system to improve the performance, efficiency, and the ease of use of dynamic analysis of dataflow in application programs.