It is known that distributed applications, particularly applications running in an open network environment like the Internet, are vulnerable to attacks by malicious users or viruses. In particular, web applications such as home banking or online shops accepting data values provided over a data network interface can be manipulated by sending a maliciously crafted data value to the program.
A well-known example of such an attack is the exploitation of buffer overflows. By providing a very large or non-terminated data value to a program, it often fails upon evaluation of the data value, sometimes resulting in a crash of security critical programs or systems. This can be exploited as a means of disabling security measures, among other things.
A second, related attack scenario is to provide a data value that will result in the execution of control statements provided as part of the data value. Such an attack is commonly referred to as an injection attack, where a foreign, typically user-provided control statement is injected in a typically programmer-provided statement.
FIG. 2 shows a schematic data flow diagram for a request to a computer system used for an injection attack. A computer program assembles an SQL query using a template comprised in a first data value 8 and user data provided as a second data value 9. The first data value 8 is provided in the form of a constant of the program by a programmer. The intended meaning of the template provided as first data value 8 is to select the identity “id” of a user from a database table “users”, which is identified by a given user name and password.
The second data value 9 received from a user computer comprises a maliciously crafted data value. The second data value 9 provided from a second, untrustworthy data source comprises a control statement in addition to the requested input parameters. By providing the expression “jan′ OR ‘1′=’1” as a first input parameter, a third data value 10, computed based on the first data value 8 and the second data value 9, comprises a query with a different semantics as intended by the programmer of the computer program.
The second data value 9 comprises so-called escape sequences, in this particular case given by the single quote signs, which indicate an end of the data provided as user name and thus would lead to the interpretation of the following OR-expression as part of the control data by a database system. If no further checking is performed and the third data value 10 is passed on unmodified to the database system, a database query processor will evaluate the third data value 10 to a control part 18 and a database 19 as indicated in the lower part of FIG. 2.
Note that parts of the second data value 9 are contained in the control part 18. Due to the order of the execution of the query contained in the control part 18, a valid user id is returned to the computer system 1, even if the password provided as part of the second data value 9 and decoded as data part 19 is incorrect. This is due to the fact that only one part of the OR-expression needs to evaluate to true in order for the database query processor 13 to return a valid result to the computer system 1.
In the given example, the provision of a valid user name suffices to return a valid user id from the database. Because the injected control statement comprises an OR-operator and the AND-operator has precedence over the injected OR-operator, the parts left and right of the OR-operator are evaluated independently. In consequence, the password provided as part of the user data value is irrelevant for the successful completion of the query as the part of the query to the left of the OR-expression alone can produce a result.
A method known as “variable tainting” from the programming language Perl (pages 558-561 of Wall, L., Christiansen, T., and Orwant, J.: Programming Perl, O'Reilly 2000, 3rd edition) is aimed at preventing injection attacks. Data values received from an untrustworthy source, such as an HTTP (hyper text transfer protocol) request, are marked or “tainted” upon reception. The programmer then adds validation code, which checks the received data value for validity and removes the taint. If the program attempts to use a tainted variable without such a previous check, an error message is generated, warning the programmer to include a suitable validation mechanism.
There are problems relating to this approach, however. The code provided for the test has to be written by the application programmer, who may not be aware of all present and future vulnerabilities. If he or she only checks for some common vulnerabilities, for example buffer overflows, while ignoring others, like the injection of an SQL statement, the program may still be successfully attacked without raising an error or warning.
Another problem is the mixing of data values received from different sources, some of which are trustworthy while others are not as was illustrated by the example given in FIG. 2. Because only those parts originating from an untrustworthy source should be checked for the presence of control statements, the check should be performed as early as possible, for example upon first reception of a data value.
The knowledge what to check for, SQL escape characters in an SQL expression in the case of FIG. 2, might not be available upon reception of the second data value 9, however. This knowledge, i.e. the context of it is not fully known until it is output to an external database system 12, for example. Yet, whether or not a data value 9 may be harmful depends on this context, i.e. in the way the data value 9 is used, for example by combining it with the first data value 8. Consequently the check should be performed after this context is known, i.e. as late as possible.
Obviously these two requirements are conflicting, resulting in checks that will be either too restrictive, severely limiting the functionality of developed applications, or too lax to guarantee stringent security requirements.
Consequently, it is a challenge to provide improved methods and systems for checking a data value.