1. Field of the Invention
This invention relates to string analysis, and particularly to systems, methods and computer program products for string analysis with security labels for vulnerability detection.
2. Description of Background
String analysis and taint analysis are analysis techniques used for vulnerability detection of SQL injection or XSS, and they complement one another. In string analysis, strings, which may be constructed by the program, are inferred, and vulnerabilities can be detected by comparing the inferred strings with the expected strings. In taint analysis, vulnerabilities can be detected through calculating taintedness of each datum based on how the datum is constructed from other data, in order to check if the datum includes tainted (untrustable) fragments.
However, when checking, for example, if “user input strings in the query are constructed only by numeric characters” using each analysis technique, whether the strings were input by the user or were defined as constants within the program cannot be determined through string analysis. In addition, whether or not the strings are constructed only by numerical characters cannot be determined through taint analysis. Moreover, even if string analysis and taint analysis are performed separately and their results are combined, this process can only determine whether or not the entire string obtained through the string analysis is tainted. For this reason, it is difficult to detect substrings that have been constructed by certain tainted sources, such as the “user input”.
Additionally, these analysis techniques often cause false detections, since, in many cases, analyses are performed without considering conditional statements in the program. Programs usually determine and filter out insecure data which cause SQI, injection or XSS with conditional statements, and if insecure data are found, the program performs data correction (sanitization) to avoid security flaws. Therefore, analysis techniques that do not take into consideration conditional statements and data sanitization will yield a significant number of false detections, and thus are often not practically useful.