1. Field of the Invention
This invention pertains in general to computer security and in particular to detecting database intrusion and data theft attempts.
2. Description of the Related Art
Databases are widespread in modern computing environments. Companies and other enterprises rely on databases to store both public and private data. Many enterprises provide publicly-accessible interfaces to their databases. For example, an electronic commerce web site typically includes a “search” field that accepts search terms and allows an end-user to search items for sale on the site. This search field is a publicly-accessible interface to a database that stores data describing the items for sale. Similarly, an enterprise can maintain a private database that is accessible to only employees of the enterprise.
At a technical level, many of these databases work by having a web server provide a web browser executing on the client with an HTML and/or JAVASCRIPT™-based form. The web browser displays this form on the client, and the end-user provides values for the fields in the form. The end-user performs an action, such as pressing a “Submit” button, that causes the web browser to send the entered values to the server. At this point, back-end logic at the server constructs a query to the database using the user-supplied values. This query executes on the database and the server returns the results to the client web browser.
Malicious end-users can exploit the web interface to the database to perform malicious actions such as obtaining access to confidential information. For example, in an SQL (Structured Query Language) injection attack, the attacker fills out the form using specially-crafted data. These data, when used by the server to generate a query to the database, result in a malicious query being sent to the database on behalf of the attacker.
A database intrusion detection system (DIDS) attempts to detect malicious queries. One type of DIDS works by observing legitimate database queries during a training period and generating a set of templates describing those queries. After the training period, queries that match the templates are allowed to execute while queries that do not match are treated as potentially malicious. This technique works well if the queries encountered during the training period are representative of legitimate queries, the types of queries issued on the database are relatively static, and the database itself is relatively static.
However, training a DIDS is difficult. There are often queries that run at only certain times or dates. For example, a query for generating a quarterly report might run only once every three months. These queries might not be encountered during the training period. Moreover, training the database more frequently or over longer intervals increases the risk that a malicious query will get incorporated into the set of template queries.
Further, queries and databases in the real world are rarely static. It is common for new table and fields to be added to a database. Likewise, new queries will be issued that access these new areas. The new queries will not match the template queries and, as a result, will cause the DIDS to generate false positive intrusion detections.
Therefore, there is a need in the art for a way to train a DIDS and generate templates for legitimate queries that does not suffer from the deficiencies mentioned above.