The ease of use and general applicability of relational databases has resulted in their extensive use in many different environments. Business, especially, have found relational databases appropriate for their needs. For example, businesses often employ accounting applications that operate on underlying relational databases. The present invention will be described with reference to relational databases, it being understood that the present invention may find use in conjunction with other types of databases.
A relational database stores its data in tables. A table contains a set of rows and columns. Each row has a set of columns, and each column in a row has only one value or entry. All rows in the same table have the same set of columns. A row from a relational table is often referred to as a record, and a column to a field.
Consider for example, the relational database “Employees” shown below:
EmployeesNameSalaryPositionAgeMark Ryan42000Manager44Dean Reagan40000Engineer28Pam McClelland45000President34Jim Tressel95000Vice President55Vince Young75000Chief Financial Officer21The Employees table contains five records, each record containing four fields. Each field of a record contains a value or entry. For example the field designated “Age” in the first record stores “44” as its value.
Structured Query Language (SQL) is the most popular computer language used with relational databases. Operations on relational databases including operations to query, insert, update, and delete data from tables can be performed using SQL commands or statements. For example, commands called SQL queries can be used to retrieve data from one or more tables in a relational database for subsequent processing by an application executing on a computer system (e.g., a server). When a SQL query is generated, there is no need to explicitly tell the database where to look for the data sought by the SQL query. It is enough to simply identify the data sought in the SQL query. Database management systems will take care of locating the data sought in response to receiving and implementing the SQL query.
Relational databases operations, including data queries, are performed by implementing SQL queries on tables stored in the database. Two of the more basic query operations include retrieving a subset of columns from a table and retrieving a subset of rows from a table. To illustrate, the SQL query:
SELECT*FROM Employees WHERE Salary <45000; Query 1
can be used retrieve all records in the Employees table above whose “Salary” field contains a value less than 45000. In response to implementing Query 1, a database management system operating in conjunction with the relational database storing the Employees, would return the following records:
TABLE 1NameSalaryPositionAgeMark Ryan42000 Manager44Dean Reagan40000 Engineer28Retrieving a subset of columns from a table is another important query operation that can be performed on a relational table. To illustrate, the SQL query:
SELECT Name, Age FROM Employees; Query 2
can be used to retrieve all names and corresponding ages from the Employees table. In response to implementing Query 2, a database management system, operating in conjunction with the relational database storing the Employees table, would return:
TABLE 2NameAgeMark Ryan44Dean Reagan28Pam McClelland34Jim Tressel55Vince Young21
An application, and its underlying relational database, can be accessed by users after they login via, for example, a client computer system. SQL queries are often used during user login. Users, during login, provide credentials by typing their username (e.g., RMS) and password (e.g., Soccer) into respective fields of an interface displayed on the client computer system. This information is passed to an application, which in turn may generate the following SQL query from the user supplied input values:
SELECT*FROM Users WHERE Username=‘RMS’ AND Pass=‘Soccer’; Query 3
In response to implementing Query 3, the database management system, operating in conjunction with the relational database storing the “Users” table, would return a record back to the application. The application manipulates data of the record, creates a session, and then passes a subset to the client computer system of the user, which in turn displays a menu of options and personalized information. Notice the application generates an SQL query based on input values provided by the user.
Applications executing on servers (i.e., server applications) generate SQL queries in response to receiving requests from client computer systems. The present invention will be described with reference to a server-client data processing system, it being understood that the present invention should not be limited thereto. In many application programming languages, the standard technique for generating SQL queries is to insert specialized statements called SQL query templates into the source code. Each SQL query template can be used at runtime to generate one of several predetermined SQL queries as will be more fully described below. Server applications can also generate SQL queries dynamically in response to receiving a request from a client computer system. W. Halfond and A. Orso, Combining Static Analysis and Runtime Monitoring to Counter SQL-Injection Attacks; In proceedings of the Workshop on Dynamic Analysis (WODA 2005), gives an example of dynamically generated SQL queries. For the purposes of explanation only, the present invention will be described with reference to server applications generating SQL queries using embedded SQL query templates only, it being understood that the present invention can be used in conjunction with server applications that also generate SQL queries dynamically.
Once SQL queries are generated by a server application, the SQL queries are transmitted to a database management system as noted above. The database management system, in turn, implements an SQL query by generating one or more database transactions for retrieving data from the underlying database. Ultimately, the database management system returns database data to the requesting server application for subsequent processing. To illustrate, FIG. 1 shows relevant components of an exemplary data processing system employed by a business. The system shown in FIG. 1 contains a database server 10 coupled between an application server 12 and a relational database 14. A server application (Application A, not shown) executes on one or more processors of application server 12. The relational database 14 contains tables, including the “Users” and “Employees” tables described above, which are accessible by a database management system (not shown) executing on one or more processors of database server 10. The database management system receives SQL queries from Application A deployed on server 12. Application A can be considered a client to the database management system executing on server 10. Application A never accesses the data contents of database 10 directly, but only generates SQL queries to be implemented by the database management system. This allows sophisticated features, such as transaction processing, recovery, backup, access control, etc., to be added to database server 10 without increasing the complexity of Application A or other applications that generate the SQL queries. Data corruption risk is also reduced since the database server is the only device with access to the underlying database 14. The relational database 14 may be stored on one or more memory devices (e.g., disc arrays, not shown). A volume manager or other software (not shown) may logically aggregate the memory devices to form a data storage object (e.g., a volume) on which relational database 14 is stored.
Application server 12 is also coupled to client computer systems 16 or 18 via network 20, which may take form in a LAN, WAN, the Internet, etc., or a combination thereof. Application A generates one or more SQL queries in response to receiving a request from a user via client computer system 16 or 18. For example, a user of client computer system 18, after logging into Application A, may seek a list of all employees of the business that have a salary less than $45,000. The request is transmitted to application server 12, which in turn generates Query 1 shown above in accordance with the code of Application A. The database management system executing on server 10 receives and implements Query 1, and in turn generates one or more database transactions for retrieving data from relational database 14. The data retrieved from database 14 is provided to Application A executing on server 12 for subsequent presentation to the user via client computer system 18. In one embodiment, Table 1 above is presented to the user of client computer system 18.
Server applications, such as Application A executing on server 12, can generate various types of SQL queries for subsequent implementation by database management systems. These server applications typically provide a limited set of functional features (e.g., identify all employees who make less than a certain amount of money) to users of client computer systems, which means the server applications can generate a limited set of SQL queries. In other words, the possible SQL queries that can be generated by a server application is limited to that envisaged by the developer of the server application.
Compiled server applications, such as Application A executing on server 12, consist of binary files (e.g., executable files, libraries, etc) that contain embedded SQL query templates. For example, Application A executing on server 12 contains executable files or libraries, which in turn contain the following exemplary SQL query templates:
SELECT*FROM Employees WHERE Salary <?; Template 1
SELECT Info FROM Users WHERE login=‘?’ AND pass=‘?’; Template 2
Embedded SQL query templates, such as Template 1 and Template 2, represent the possible SQL queries envisaged by the application developer to be generated. The SQL query templates contain input markers (e.g., “?”) that are replaced at runtime by, for example, input values as is more fully described below. It is noted that SQL query templates are inserted in the source code of applications. When the source code is compiled, the strings comprising the SQL query templates are usually stored as is, in specific binary files.
SQL query templates are used at runtime to create respective SQL queries. To illustrate, when Application A receives a request from a client computer system, and the request includes user input values, an SQL query is generated by replacing the input markers (i.e., “?”) of the appropriate SQL query template with the user input values contained in the request. For example, when Application A executing on server 12 receives the request to identify all employees who make less than $45,000 as described above, the accounting application replaces the input markers “?” of Template 1 with “45000,” thereby generating Query 1. The resulting SQL query is then sent to and implemented by database server 10 as described above.
As noted, database-driven server applications have become widely employed by businesses. These applications provide a broad range of services to their users. However, these applications, and their underlying relational databases, often contain confidential or even sensitive information, such as financial records. This information can be valuable and makes database-driven server applications an ideal target for attacks. One type of attack in particular, SQL-Injection attacks (SQLIA's) is especially harmful. SQLIA's can give attackers direct access to an underlying database, and, with that, the power to leak, modify, or even delete information. SQLIA's generally occur when input provided by a user is not properly validated and is inserted directly into a template to generate an SQL query.
To illustrate, as noted above, before a user can access Application A executing on server 12, and its underlying database 14, the user must provide his credentials by entering his username (e.g., RMS) and password (e.g., Soccer) into the appropriate fields of the user interface displayed on client computer system 16. Once the user's login name and password are entered, the user activates the submit button, and client computer system generates and sends a login request to application server 12 via network 20. The login request includes the username and password input values entered by the user. Application A executing on application server 12 generates a corresponding SQL query in response to receiving the login request from client computer system 16. More particularly, Application A replaces the first and second input markers of Template 2 above with the login name and password of the login request, respectively, thereby generating Query 3 above. After Query 3 is generated, it is transmitted to database server 10 for implementation. The database management system operating on server 10 returns a record from the “Users” table to application server 12. Application A manipulates data of the record, creates a session, and then passes a subset to client computer system 16, which in turn displays a menu of options and personalized information.
However, if instead of inserting RMS and Soccer as input values into the username and password fields of the interface displayed on client computer system 16, the user maliciously enters “’ OR 1=1--” and “ ” and activates the submit button, client computer system 16 will send a login request containing “’ OR 1=1--” and “ ” to server 12. Application A would replace the input markers of Template 2 with “’ OR 1=1--” and “ ” thereby generating the following SQL query:
SELECT Info FROM Users WHERE login=‘ ’ OR 1=1--’ AND pass=‘ ’; Query 5
Query 5 is subsequently sent to the database management system, which interprets everything after the WHERE keyword token as a conditional statement. The “’ OR 1=1--” clause turns this conditional into a tautology. As a result of implementing Query 5, the database management system 10 and database 14, operating collectively, returns all records in the Users table, which in turn is forwarded by Application A to client computer system 16 for display thereon. As can be appreciated, an attacker could insert a wide range of SQL commands via this mechanism, including commands to modify or destroy database tables.
Database audit and security (DBAS) systems have been developed to protect database systems from malicious attacks, such as the SQLIA described above. In general, DBAS systems analyze all SQL queries before they are forwarded to a database management system. If a SQL query is deemed abnormal by the DBAS system, the DBAS system alerts a system administrator before the abnormal SQL command is forwarded. FIG. 2 illustrates the system shown in FIG. 1 with a DBAS system 22 interposed between application server 12 and database server 10. DBAS system 22 may take form in application code executing on a computer system. In operation, DBAS system 22 receives an SQL query from application server 12. The received SQL query is parsed into constituent elements using known SQL query string parsing technology. The elements of the parsed SQL query are then identified as SQL keywords (e.g., SELECT, FROM, WHERE, etc.), operators (e.g., =, <, >, etc.), delimiters (e.g., commas or semicolons), attributes (e.g., table names such as “Users,” column names such as “Pass,” (see, Query 3 above) etc.), and input values (e.g., numbers or text that immediately follow an operator). The input values of received SQL query are replaced by input markers (i.e., “?”) to generate an SQL query template, which in turn is compared with known, acceptable SQL query templates stored in a knowledge base. If the SQL query template corresponding to the received SQL query compares equally to a known, acceptable SQL query template in the knowledge base, the received SQL query is forwarded to database server 10. Otherwise, the received SQL query is deemed abnormal or malicious.
Before DBAS system 22 can begin to operate for its intended purpose, the knowledge base must be populated with known, acceptable SQL query templates. DBAS system 22 learns about typical, normal SQL queries by observing network-level traffic sent to database server 10 during a training phase. The knowledge base can be populated with normal SQL query templates, which are created based on the typical, normal SQL queries received during the training phase. As can be appreciated, a substantial amount of time might be needed during the training phase to populate the DBAS systems' knowledge base with normal SQL query templates. The time needed for the training phase may be unacceptable to a business that needs rapid deployment of DBAS system 22.