Web applications have become a popular way to provide services over the Internet. Common applications include activities such as reading news and emails, shopping online and paying bills. As the use of these applications grows, we witness an increase in their vulnerabilities to attacks via the Internet. One of the most dangerous attacks is “SQL injection”, performed by malicious insertion of crafted SQL queries into a vulnerable web page. Through SQL injection, an attacker gains unrestricted and unauthorized access to the underlying database. This may result in stealing of confidential financial information such as credit card numbers, modification of sensitive and personal data records, and more.
The challenge of a security system facing these types of attacks is to perform full-proof intrusion detection without any misdetections and false alarms. To achieve this, most security systems use signatures developed and gathered manually. This approach is problematic, because security systems using signatures can only detect attacks which are already known but cannot detect attacks with slight modifications or new attacks. Thus, known anomaly detection based approaches either fail to address the full range of SQL injection attacks or have technical limitations that prevent their adoption and deployment in real world installations.
SQL Injection Attacks
SQL is a textual language used to interact with relational databases. It is a standard interactive and programming language for querying, modifying and managing databases. A “query” is a typical execution unit and includes a collection of SQL statements. SQL statements can modify the structure of a database, add or remove schemes and manipulate database content.
A SQL attack is performed by embedding SQL statements and meta-characters into a query. To launch an attack, a malicious user needs to craft input strings and to send them to an application. The malicious user may then gain unauthorized access to the database, observe sensitive and confidential data, leak the data out of the web site, or even destroy the data in the database. Web applications, which read inputs from users (e.g. through web forms) and use these inputs to compose a query to the underlying database are vulnerable. A SQL attack is caused by insufficient input validation and inability to perform such input validation. Hackers have developed new methods to bypass these validations and to hack into applications. Moreover, the use of input validation techniques is labor consuming, which makes them impractical for use.
Even though the vulnerabilities that lead to SQL attacks are well known and well understood, such attacks continue to emerge due to lack of effective techniques for detecting and preventing them. Programming techniques which utilize sophisticated input validation may prevent some of these attacks, but are usually ineffective.
SQL Injection Attacks Examples.
We show several ways in which an attack can exploit known vulnerabilities. Three different examples of attacks are given. These attacks show how a potential attacker can modify the original intention of the query as designed by its programmer. The examples are based upon the following typical query:
SELECT * FROM employeelist WHERE firstName=‘“.$firstName.”’ AND lastName=‘“.$lastName.”’
The query performs a search for an employee record in a database table called employeelist according to given first and last names. If such a record exists, then it is returned. Otherwise, nothing is returned. Parameters such as first and last names are supplied by the application user through, for example, a web form. The following examples demonstrate that by entering specific meta-characters and crafted strings as parameters, the original behavior of the query changes. As a result, a complete employeelist database table is retrieved (instead of the one searched employee record).
Example 1 demonstrates a tautology-based attack. The user submits for firstName and lastName the values some_string and ‘OR ‘b’=‘b, respectively. Therefore, the WHERE clause is evaluated to be true. The constructed query looks as follows:
SELECT * FROM employeelist WHERE firstName=‘some_string’ AND lastName=“OR ‘b’=‘b’
Example 2 demonstrates a tautology-based attack combined with a “commenting” technique. The user submits for firstName and lastName the values ‘ OR 1=1# and some_string, respectively. The SQL special character ‘#’ denotes the beginning of a comment. Therefore, part of the WHERE clause (until the ‘#’ character) is evaluated to be true, while the rest of it becomes irrelevant due to the presence of the comment sign. The constructed query looks as follows:
SELECT * FROM employeelist WHERE firstName=∂OR 1=1#’ AND lastName=‘some_string’
Example 3 demonstrates the use of a “UNION SELECT” attack combined with a “commenting” technique. The SQL ‘UNION’ command combines the results of two queries. The attacker submits for firstName and lastName the values ‘union select * from employeelist# and some_string, respectively. By doing that, the attacker adds another injected query, which can be fully controlled. The returned result is a union of the first query and the second query, which is an injected query. The constructed query looks as follows:
SELECT * FROM employeelist WHERE firstName=“union select * from employeelist #’ and lastName=‘some_string’
Related Work
A wide range of solutions that address the SQL injection phenomena have been proposed over the years. These solutions range from development of new programming techniques to fully automated frameworks for detection and prevention of these attacks. Some of the latest methods which handle SQL injections are reviewed next.
a) AMNESIA (see W. G. Halfond and A. Orso, “AMNESIA: Analysis and Monitoring for NEutralizing SQL-Injection Attacks”, Proceedings of the IEEE and ACM International Conference on Automated Software Engineering (ASE 2005), Long Beach, Calif., USA, November 2005) is a model based technique which uses static analysis and runtime monitoring. It is based on the assumption that it is possible to describe a model for legitimate SQL queries by analyzing the source code that generates them. In the static analysis part, AMNESIA uses offline program analysis to build a model for the legitimate and expected queries that can be generated by the application. It scans the source code in order to find the points where SQL queries are constructed. It then builds a model for each point. In the dynamic part, it monitors the dynamically generated queries at runtime and checks their compliance with the statically generated model. Queries that violate the model represent potential hazard and are thus prevented from being executed on the database while being reported.
b) CSSE (see T. Pietraszek and C. V. Berghe, “Defending Against Injection Attacks through Context-Sensitive String Evaluation”, Proceedings of Recent Advances in Intrusion Detection (RAID2005), 2005) is a technique for defending against SQL injection attacks by tracking the query fragments origin and taint information (if existing). This technique uses a context sensitive analysis to detect and reject queries which include untrusted inputs. At the first step, it marks with metadata all the user originated data in order to keep track of the fragments' origin. This is done by overriding Personal Home Page (PHP) interpreter functions. PHP is a scripting language originally designed for producing dynamic web pages. This way, the metadata allows distinguishing between developer-provided and user-provided strings. Then, it intercepts all the application programming interface (API) calls to the database layer. CSSE checks if there is any metadata associated with the SQL expression and then performs the necessary checks on the un-trusted parts.
c) Parse-Tree (see G. T. Buehrer, B. W. Weide, and P. A. G. Sivilotti, “Using Parse Tree Validation to Prevent SQL Injection Attacks”, International Workshop on Software Engineering and Middleware (SEM), 2005) is based on comparing the grammatical structure of a SQL query and an expected query model at runtime. The first one is the original query, which does not include the user's input tokens. The second one is the resulting query after incorporating the user's input. The comparison is done between the parse trees of these queries. This technique determines if the two queries are equal by comparing their tree structures. It uses an API which provides parsing and string building capabilities. Concatenation of SQL query fragments is done using this API.
d) SQLRand (see S. W. Boyd and A. D. Keromytis, “SQLrand: Preventing SQL Injection Attacks”, Proceedings of the 2nd Applied Cryptography and Network Security (ACNS) Conference, pages 292-302, June 2004) performs instruction-set randomization of SQL keywords. It provides a framework which allows developers to create SQL queries using encoded keywords instead of normal ones. The SQL standard keywords are manipulated by appending thereto a random integer. This is something that cannot easily be guessed by an attacker. A proxy filter intercepts these queries to the database. Its primary obligation is to validate the random SQL query, de-randomize the keywords and then forward the SQL query to the database. A query, that includes a user attack, is evaluated as an invalid expression because hard-coded keywords are randomized while the user's input keywords are not. The system design includes a library for the developer to rewrite the keywords.
All the above proposed solutions suffer from the same deployment problem. Since every solution intercepts a SQL query after incorporating the user's input, the detection system cannot be installed physically before the web server itself. Since large organizations have many web servers, these systems have to be duplicated. This makes them less suited for deployment.
Yet another anomaly detection solution for the SQL injection attack problem is provided in U.S. patent application Ser. No. 12/263,473 by Averbuch et al., filed Nov. 2, 2008. In this solution, multidimensional data which is reduced in dimension to form clusters of normal data, with abnormal data points residing outside the clusters.
FIG. 1 illustrates schematically an exemplary organizational network architecture. The network architecture may consist of several web servers (left side of the figure) where each server connects to a different database (right side of the figure). All web servers are connected through a main switch. In this architecture, the proposed solutions need to be located at segments C or D. This constraint imposes system duplication—one solution for each web server. Another drawback of these solutions is related to the efforts needed in the integration and the required modifications to existing infrastructure. Integration of these solutions into a commercial network will consume high managerial efforts. For example, AMNESIA requires accessibility to every written source code (old or new) which approaches the database. CSSE overrides the PHP interpreter functions. The Parse-Tree and SQLRand methods also dictate a revision and update of all previously written source codes. In addition, some of the solutions are not transparent to the developer. By using the Parse-Tree method, the developer needs to adapt himself to a new programming method. By using SQLRand, the user has to use a tool that rewrites all the SQL keywords. To summarize, the reviewed solutions are impractical to efficiently handle SQL injection attacks. These solutions suffer from problems of deployment, integration and transparency to the developer.