1. Technical Field
The present invention relates to static program analysis and more particularly to string analyses that infer string values arising at runtime without executing a program to provide a function analysis for automatic detection and categorization of information-flow downgraders.
2. Description of the Related Art
The Information-Flow Security principle establishes that no “illicit flow” of information be allowed in a program. A flow is illicit if it allows untrusted information to be used in a trusted computation (an integrity violation) or if it allows secret information to be entirely or partly revealed to unauthorized users (a confidentiality violation). Integrity and confidentiality can be seen as dual problems by simply stating that there should not be any flow of information from “high” to “low”, where “high” means “untrusted” in integrity and “secret” in confidentiality, and low means “trusted” in integrity and “public” in confidentiality.
Information can be tagged with information flow labels. Typically, information flow labels form a partially ordered set or even a lattice. If information-flow security was strictly enforced and no illicit flow of information was allowed, most programs would not work. To be information-flow secure, a program would have to be “partitioned” so that information tagged with a certain label “/” can only flow to program points that have been tagged with labels higher than or equal to “/”. A program with these restrictions is very unlikely to be useful. For example, from an integrity point of view, a Web application is supposed to accept inputs from potentially untrusted users and use those inputs in trusted computations. E.g., an online banking program takes as input the account number and the password of a user (potentially untrusted or malformed information) and passes them to the backend database system where they are used in a trusted setting. In another example, an online bookstore takes as input the user ID and password of the customer and the title of the book that the customer wants to buy (all potentially untrusted or malformed information), and uses them to complete a transaction, etc.
From a confidentiality point of view, a Web application often releases data that has been computed based on secret information and, as such, should be considered secret as well. E.g., a banking application may reveal to any teller the last four digits of the social security number of any user, an online bookstore may reveal to any shop assistant the last four digits of any customer's credit card number, etc. Given that all these programs exhibit flows that allow “high” information to flow to “low” program points, all these programs would be rejected if information-flow security were simply enforced. To permit these programs to function, “high” information can be “downgraded” and become “low” enough to be used in “low” program points.
Downgrading translates itself into “endorsement” in integrity and “declassification” in confidentiality. For example, once a program has verified that the user-provided input to a Web application is a properly formatted string, the program can endorse that input, which now becomes trusted enough to be used in a trusted computation. Similarly, once a program has verified that the information extracted from a secret is not sufficient to reveal the secret itself, the program can declassify the extracted information, which now can become public enough to be revealed to a public listener.
TABLE 1Information-Flow SecurityIntegrityConfidentialityHighUntrustedSecretLowTrustedPublicDowngradingEndorsementDeclassification
A program can implement many downgraders. A program should not accept any “high” input to a “low” function unless that “high” input has been previously downgraded. Furthermore, a downgrader is specific for just a subset of the set of “low” functions. For example, an integrity “low” function that accepts an input in the form of a string, concatenates that string into a Structured Query Language (SQL) query, and then submits it to a database. The function will require its input not to contain semicolons and apostrophes, since such characters will be interpreted by the database as SQL commands. Therefore, any input to this “low” function should have undergone sanitization or endorsement, to make sure that such illicit characters are not there.
Only if a trusted sanitizer has verified the absence of such illicit characters will that initially untrusted string be accepted to be used in the SQL query. However, if the “low” function is not responsible for performing SQL queries, but rather for concatenating its string input value into HyperText Markup Language (HTML) code, then a different sanitization is necessary. The issue here is no longer to prevent SQL injections, but rather to prevent what are known as Cross-Site Scripting (XSS) attacks. In this case, the sanitization function must check for absence of specific JavaScript tags, such as <script> and </scripts>.
Downgraders are often available in libraries, and are categorized based on the specifications of the corresponding “low” functions. Often, however, Web applications implement their own downgrading functions. This makes security static analysis of Web applications very complex. In fact, a static analysis for information-flow security should receive as input the signature of the downgrading functions as well as rules that map downgrading functions to the corresponding “low” functions. At that point, the static analysis can verify whether the input to a “low” function has always undergone proper downgrading, without any path leading to a “low” function unless its inputs have been properly downgraded. Unfortunately, when Web applications implement their own downgrades, it is very difficult to detect those downgraders and categorize them in a way that the static analysis for information-flow security can subsequently account for them.
The difficulties may include the following: 1. If manual code inspection is adopted, not all the source code may be available. Some code may have been produced and purchased by a third party. Therefore, manual code inspection for detection and categorization of information-flow downgraders may not be possible. Even if all the source code is available, manual code inspection may not be a feasible option given the large amount of code that needs to be inspected. In general, manual code inspection is error-prone, difficult, time consuming, and unreliable. 2. Dynamic analysis or testing could be used. However, the coverage of a dynamic analysis depends on the completeness of the test-case suite under use. In the absence of a complete suite of test cases, a dynamic analysis is not guaranteed to detect all the possible downgraders used by an application, and the categorization of the downgraders will be incomplete.