This specification relates to data flow analysis.
Data flow analysis derives information about how values are bound to variables of a program by examining static code of the program. To derive this information, a data flow analysis system determines between which software elements data in a program can flow. This information may be represented as a data flow graph.
A common use of data flow analysis is to mark some software elements of the program as tainted. In some cases, the tainted software elements of the program are those that can hold a value supplied by an external user, which are potential security weaknesses. For example, the user can perform an injection attack by providing the software program with a malicious Structured Query Language (SQL) query. If this “tainted” query is executed without first being cleansed, the database may be compromised, for example by dropping tables or providing confidential information to the user.
Some data flow analysis approaches do not consider calling contexts. A calling context represents some aspect of an individual call to a function, such as the tainted or untainted status of arguments to a function. If not using calling contexts, the data flow analysis can result in many false positives because if a return value of the function can ever receive tainted data, a system may consider every call to the function to be tainted.
False positives can be reduced by using Cartesian Product calling contexts, in which the function is separately analyzed for each distinct combination of tainted or untainted arguments to the function. However, considering each combination separately results in an exponential complexity for performing data flow analysis.