1. Statement of the Technical Field
The present invention relates to the field of database security and more particularly to the remediation of data leak conditions in an information system.
2. Description of the Related Art
Information systems such as database systems have fulfilled a substantial role in computing from the start. From the most basic data driven application, to complex database management systems, end users have always benefited from the ability to cull a subset of desired data from a large corpus of data based upon one or more search terms. Largely due to the efficiency and speed of database systems, whole industries have experienced dramatic gains in efficiency based upon the ability to retrieve desired record sets from vast collections of data.
The advent of the Internet has further accelerated the adoption of information systems among the consuming public. Prior to the wide-scale adoption of Internet based computing, database systems could be accessed and utilized only by a select group of users—insiders to the managing organization. Accordingly, concerns relating to the security of the data in the database could be limited to those limited few having access to the database system and those limited few having access to the physical plant hosting the computing systems which support the database system. Nevertheless, publicly accessible database systems—particularly those employing a Web based interface—have changed the level of vulnerability of database systems to unauthorized intrusions and data leaks.
Generally, to combat the enhanced threat of unauthorized intrusions in a computer communications network, information technologists utilize intrusion detection system (IDS) technology. IDS technology can detect network intrusions dynamically as they occur or post-mortem after the intrusion has occurred. A typical dynamic network IDS can include a monitoring component able to capture network packets as the packets pass through the IDS, an inference component for determining whether the captured traffic indicates any malicious activity or usage, and a response component able to react appropriately to the detection of a malicious intrusion. While the response can include the generation and transmission of a simple e-mail message to a system administrator, the response also can include more complex actions, for instance temporarily blocking traffic flowing from an offenders Internet protocol (IP) address.
Conventional IDS technology can incorporate a variety of methodologies for determining within the inference component whether malicious activity has occurred or is occurring. Referred to as “detection methodologies”, examples can include simple pattern matching, stateful pattern matching, protocol decode-based signatures, heuristic-based signatures, and anomaly detection. Stateful pattern matching is an enhanced, more mature version of simple pattern matching based upon the notion that a stream of network traffic includes more than mere stand-alone packets. Protocol decode-based analysis, in turn, has been considered to be an intelligent extension to stateful pattern matching. In protocol decode-based analysis, traffic first is decoded in real-time according to a specified protocol such as HTTP in order to identify the pertinent fields of the protocol. Once the fields of the traffic specified by the protocol have been decoded, pattern matching can be applied to the decoded fields.
Unlike intrusion detection techniques which rely directly upon pattern matching, a heuristic-based analysis employs algorithmic logic upon which intrusion detection signatures can be based. Typically, the algorithmic logic can analyze traffic patterns in order to match a particular traffic pattern with a known “signature”. Of course, any heuristic-based analysis can report false positives where a pattern of legitimate access to a network device satisfies the algorithmic logic. Hence, the use of a heuristic-based analysis requires extensive and frequent tuning to limit such false positives. Similar to the heuristic-based analysis, in an anomaly-based analysis, traffic can be dynamically inspected as the traffic passes through the IDS. In an anomaly-based analysis, however, traffic patterns can be analyzed to detect anomalous behavior.
Despite the advancement of IDS technologies, IDS technologies alone cannot account for data leak vulnerabilities. A data leak refers to the unintentional dissemination of data in a database system through the failure of a database system to secure data for viewing only by authorized parties. For example, simple queries using widely accessible search engine Web sites can produce references to a handful of Web sites that have posted credit card information to the Web. The lists of financial information include hundreds of names for respective card holders, addresses and phone numbers as well as credit card data. Some news media outlets have referred to this security breach as an example of “Google hacking”. As it will be apparent from this example, knowledgeable net surfers can obtain sensitive information simply by mining the world's best-known search engine.
There is no shortage of ways to search popular search engines to find sensitive data. Entire Web sites specify how to search for financial information and describe software vulnerabilities and vulnerable configurations, Web servers and database systems. Popular search engines remain the tool of choice because of the powerful search options provided by often used search engines, such as the ability to search for a range of numbers which can be useful in finding credit card data. As a general pattern, however, malicious hackers simply can toss a large net into the sea of data by generating search queries aimed at producing large result sets most likely to contain rich quantities of sensitive data.