The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for automated detection of flaws and incompatibility problems in information flow downgraders, also referred to as security downgraders, or simply downgraders.
The Information-Flow Security principle establishes that no “illicit flow” of information is allowed in a program. A flow is illicit if it allows untrusted information to be used in a trusted computation (an integrity violation) or if it allows secret information to be entirely or partly revealed to unauthorized users (a confidentiality violation). Integrity and confidentiality can be seen as dual problems by simply stating that there should not be any flow of information from “high” to “low”, where “high” means “untrusted” in integrity and “secret” in confidentiality, and low means “trusted” in integrity and “public” in confidentiality.
Information can be tagged with information flow labels. Typically, information flow labels form a partially ordered set or a lattice. If information-flow security was strictly enforced and no illicit flow of information was allowed, most programs would not work. To be “information-flow secure,” a program would have to be “partitioned” so that information tagged with a certain label “X” can only flow to program points that have been tagged with labels higher than or equal to “X”.
A program with these restrictions is very unlikely to be useful. For example, from an integrity point of view, a Web application is supposed to accept inputs from potentially untrusted users and use those inputs in trusted computations. For example, an online banking program takes as input the account number and the password of a user (potentially untrusted or malformed information) and passes them to the backend database system where they are used in a trusted setting. In another example, an online bookstore takes as input the user ID and password of the customer and the title of the book that the customer wants to buy (all potentially untrusted or malformed information), and uses them to complete a transaction, etc.
From a confidentiality point of view, a Web application often releases data that has been computed based on secret information and, as such, should be considered secret as well. For example, a banking application may reveal to any teller the last four digits of the social security number of any user and an online bookstore may reveal to any shop assistant the last four digits of any customer's credit card number, etc. Given that all these programs exhibit flows that allow “high” information to flow to “low” program points, all these programs would be rejected if information-flow security were simply enforced. To permit these programs to function, “high” information can be “downgraded” and become “low” enough to be used in “low” program points.
Downgrading translates itself into “endorsement” in integrity and “declassification” in confidentiality. For example, once a program has verified that the user-provided input to a Web application is a properly formatted string, the program can endorse that input, which now becomes trusted enough to be used in a trusted computation. Similarly, once a program has verified that the information extracted from a secret is not sufficient to reveal the secret itself, the program can declassify the extracted information, which now can become public enough to be revealed to a public listener.
A program can implement many different types of downgraders. That is, these downgraders are used because a program should not accept any “high” input to a “low” function unless that “high” input has been previously downgraded. A particular downgrader operates for a particular specific subset of the set of “low” functions and thus, a program may be required to implement a plurality of different types of downgraders.
For example, an integrity “low” function that accepts an input in the form of a string, concatenates that string into a Structured Query Language (SQL) query, and then submits it to a database. In this example, the function will require its input not to contain semicolons and apostrophes, since such characters will be interpreted by the database as SQL commands. Therefore, any input to this “low” function should have undergone sanitization (i.e. transformation of an illegal input by removing/replacing suspect parts of the illegal input) or endorsement, to make sure that such illicit characters are not there. Only if a trusted sanitizer has verified the absence of such illicit characters will that initially untrusted string be accepted to be used in the SQL query.
However, if the “low” function is not responsible for performing SQL queries, but rather for concatenating its string input value into HyperText Markup Language (HTML) code, then a different sanitization is necessary. The issue here is no longer to prevent SQL injections, but rather to prevent what are known as Cross-Site Scripting (XSS) attacks. In this case, the sanitization function must check for absence of specific JavaScript tags, such as <script> and </scripts>.
Downgraders are often available in libraries, and are categorized based on the specifications of the corresponding “low” functions. Often, however, Web applications implement their own downgrading functions. This makes security static analysis of Web applications very complex. In fact, a static analysis for information-flow security should receive as input the signature of the downgrading functions as well as rules that map downgrading functions to the corresponding “low” functions. At that point, the static analysis can verify whether the input to a “low” function has always undergone proper downgrading, without any path leading to a “low” function unless its inputs have been properly downgraded. Unfortunately, when Web applications implement their own downgraders, it is very difficult to detect those downgraders and categorize them in a way that the static analysis for information-flow security can subsequently account for them.
Web applications are particularly vulnerable to security attacks because they feed on user input and are typically accessible by a large number of users. According to the Web Application Security Consortium (WASC), approximately 100,000 security vulnerabilities were found and fixed in 2008 with 52,587 of these vulnerabilities being either urgent or critical. This illustrates the importance of protecting Web applications against malicious inputs. This protection is typically implemented using the endorsement/downgrader mechanisms previously described above which either sanitize the user's input (i.e. transform the input by removing/replacing suspect parts of the input) or validate the user's input (i.e. reject the user's input if it is judged to be illegal).
Sanitizers and validators can be thought of as the last (and most application-specific) line of defense against attacks. These mechanisms usually embody subtle reasoning which is meant to distinguish between legal and illegal inputs in various contexts. Moreover, these mechanisms themselves are the interface between the security experts and the application developers. Writing them correctly is not a standard coding task as a thorough understanding of security threats (in the form of the long catalogue of known security attacks) is required. Best practices, guidelines, and policies on how to create sanitization and validation mechanisms are often found in security documents. The challenge is to check whether these guidelines are followed in the code of the Web application. There is currently no automated mechanism to carry out this check.
Moreover, since downgraders are typically written by software engineers, whose expertise lies in developing software rather than understanding the security implications of their design and engineering choices, the number of attacks that are due to incorrect input downgrading is alarmingly high. Most commonly, certain end cases related to removing illegal characters or sequences of characters are left out or addressed incorrectly in the implementation of the downgrader. However, there are also cases where correct downgrading is sensitive to the concrete implementation of the server side components. For example, a downgrader, protecting against SQL injection (SQLi) attacks should apply a different transformation for each type of database server, as these use different meta-characters when parsing SQL commands, e.g., an MS SQL server interprets double hyphen (--) as the beginning of a comment, whereas another database server might interpret the pound symbol (#) as the start of a comment.
Attackers can easily and effectively identify instances where incorrect sanitization is applied to the application by employing fuzzing techniques. This makes things even worse since, at the same time that the attacker reaches the conclusion that a program's protection layer is broken, the attacker simultaneously learns which inputs are used in security sensitive areas in the code, which facilitates the ensuing steps of the attack. Thus, additional challenges arise when determining whether a downgrader is compatible with the systems the downgrader is intended to protect.