A modern organization typically maintains a data storage system to store and deliver records concerning various significant business aspects of the organization. Stored records may include data on customers (or patients), contracts, deliveries, supplies, employees, manufacturing, etc. A data storage system of an organization usually utilizes a table-based storage mechanism, such as relational databases, client/server applications built on top of relational databases (e.g., Siebel, SAP, etc.), object-oriented databases, object-relational databases, document stores and file systems that store table formatted data (e.g., CSV files, Excel spreadsheet files, etc.), password systems, single-sign-on systems, etc.
Table-based storage systems typically run on a computer connected to a local area network (LAN). This computer is usually made accessible to the Internet via a firewall, router, or other packet switching devices. Although the connectivity of a table-based storage system to the network provides for more efficient utilization of information maintained by the table-based storage system, it also poses security problems due to the highly sensitive nature of this information. In particular, because access to the contents of the table-based storage system is essential to the job function of many employees in the organization, there are many possible points of possible theft or accidental distribution of this information. Theft of information represents a significant business risk both in terms of the value of the intellectual property as well as the legal liabilities related to regulatory compliance. In order to prevent malicious and unintentional data breaches, commercial and government regulations often impose restrictions on how confidential data may be stored, the format of confidential data, who can access that confidential data, as well as whether confidential data may be transmitted (e.g., by email). In order to comply with these regulations, companies create policies to govern how confidential data is stored in the various applications, in what format the confidential information is stored, who can access that confidential data, and to prevent transmission of confidential data. In order to implement these policies, conventional systems can detect policy violations, however, each policy violation is treated as an individual incident and recorded individually.
For example, for each recorded policy violation, typically an administrator would manually try and find out what happened to cause each of the policy violations, and subsequently perform remediation duties required by the policy for each of the policy violations. Although the administrator can manually correlate multiple policy violations by manually identifying similarities between the multiple policy violations, this process may be very inefficient, especially for a large number of policy violations. Also, by manually correlating the policy violations, it may be very difficult to correlate policy violations that occur as part of a related set of events. For example, an email exchange involving many emails could generate multiple policy violations over time, however, the administrator may attempt to remediate each of the generated policy violations individually, unless the administrator manually identifies that these policy violations are part of a single set of events. Since administrators need to be able to identify abnormal patterns of policy-violating behavior, the process of manual correlations is a cumbersome process that consumes a lot of time to identify similarities between various policy violations. This problem may be compounded with the time between the possibly related policy violations. For example, it may be very difficult to manually correlate policy violations that have occurred in different moments of time, such as twenty days between policy violations. In addition, for cases that require immediate remediation, manual correlation by an administrator may be not fast enough to identify the policy violations as being related for immediate remediation. For example, manual correlation may not be efficient in a scenario where a given user commits many policy violations within a short time period.
There are conventional security-oriented network monitoring products that attempt to deal with event correlation, however, these conventional solutions tend to be focused on correlating repeated sequences of events, such as non-policy-violating events, rather than finding multiple incidents with similar attributes. Moreover, none of these conventional solutions deal with policy violations, such as violations of data loss prevention policies.