This invention relates generally to analysis of application code and, more specifically, relates to analysis of programs using rule matching for languages with no types or as an adjunct to current analyses, for security vulnerability analyses.
This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application and is not admitted to be prior art by inclusion in this section. Acronyms that appear in the text or drawings are defined below, prior to the claims.
Information-flow violations comprise the most serious security vulnerabilities in today's Web applications. Such information-flow violations may include the following: cross-site scripting (XSS) attacks, which occur when a Web application accepts data originating from a user and sends the data to another user's browser without first validating or encoding the data; injection flaws, the most common of which is Structured Query Language injection (SQLi), which arise when a Web application accepts input from a user and sends the input to an interpreter as part of a command or query, without first validating the input; malicious file executions, which happen when a Web application improperly trusts input files or uses unverified user data in stream functions, thereby allowing hostile content to be executed on a server; and information leakage and improper error-handling attacks, which take place when a Web application leaks information about its own configuration, mechanisms, and internal problems. Each of these vulnerabilities can be cast as a problem in which tainted information from an untrusted “source” propagates, through data and/or control flow, to a high-integrity “sink” without being properly endorsed (i.e., corrected or validated) by a “sanitizer”.
Automatically detecting such vulnerabilities in real-world Web applications may be difficult. However, static analysis may be used to analyze Web applications. Static analysis is an analysis that involves examining the code of applications such as Web applications without executing the code of the Web application. Some type of model is (or models are) created of the code of the application, to estimate what would happen when the code actually is executed. One part of a static analysis for these vulnerabilities is a taint analysis, which tracks “taint” from sources to sinks (or to and through sanitizers).
Rules are something used by taint analyses to configure where to start tracking tainted flows, where to stop tracking tainted flows, and where to report vulnerabilities. Traditionally, rules are expressed using types of objects, e.g., the method getText from the type UserContent returns (potentially) malicious data; this method would be a source, which is where tainted flows start. A source is a method whose return value is considered tainted (e.g., untrusted) or an assignment from a tainted field of an object. A rule for this source might indicate that “objects of type UserContent are sources of potential taint”. A taint analysis therefore examines objects based primarily on type. Tainted flows are typically invalidated at sanitizers, and terminated at sinks, although these actions may be up to the implementation of the analysis. A sanitizer is a method that manipulates its input to produce taint-free output. For instance, a sanitizer such as SqlSanitizer.sanitize can be considered to produce taint-free output for the vulnerability of SQLi. Tainted flows are reported as vulnerabilities when the flows reach sinks, such as PrintStream.println. A sink is a pair (m, P), where m is a method that performs security-sensitive computations and P contains those parameters of m that are vulnerable to attack via tainted data. For the definitions of sink, source, and sanitizers and additional information, see, e.g., Tripp et al., “TAJ: Effective Taint Analysis of Web Applications”, PLDI'09, Jun. 15-20, 2009, Dublin, Ireland.
In languages without a strong type system, it is difficult to dictate which objects in the program are of interest (e.g., as being sources, sinks, and sanitizers). A type (also called “data type”) of an object is, e.g., a classification identifying one of various types of data that determines the possible values for that type, the operations that can be done on values of that type, the meaning of the data, and the way values of that type can be stored. It is noted that this is only one definition of type of an object, and other definitions may also be suitable. Furthermore, even with a type system, it is difficult to differentiate between a harmlessly created object of a specific type and one constructed through malicious means. For example, TextBox.getText should be a method that returns source data when the textbox is retrieved from the application, but if the method is programmatically created and never interacts with the user, the user the method should not be a source of taint.