A modern organization typically maintains a data storage system to store and deliver sensitive information concerning various significant business aspects of the organization. Sensitive information may include data on customers (or patients), contracts, deliveries, supplies, employees, manufacturing, or the like. In addition, sensitive information may include intellectual property (IP) of an organization such as software code developed by employees of the organization, documents describing inventions conceived by employees of the organization, etc.
DLP technologies apply configurable rules to identify objects, such as files, that contain sensitive data and should not be found outside of a particular enterprise or specific set of host computers or storage devices. Even when these technologies are deployed, it is possible for sensitive objects to ‘leak’. Occasionally, leakage is deliberate and malicious, but often it is accidental too. For example, in today's global marketplace environment, a user of a computing system transmits data, knowingly or unknowingly, to a growing number of entities outside a computer network of an organization or enterprise. Previously, the number of entities were very limited, and within a very safe environment. For example, each person in an enterprise would just have a single desktop computer, and a limited number of software applications installed on the computer with predictable behavior. More recently, communications between entities may be complex and difficult for a human to monitor. Furthermore, these complex communications often occur using different identities, such as communications associated with identifiers that are not assigned by the entity (referred to herein as external identifiers).
A typical user may have more than one identity in computing environments. For example, an entity, such as a corporate enterprise system, may assign a unique internal identifier to users to use the computing resources and computing services of the entity. For example, these unique identifiers can be used for logging into computing systems of the entity's networks, accessing computing resources on the network, and for controlling access to resources within the enterprise. Often, users access other external resources and services that are available in the public domain via the Internet, such as by communicating with an external service over the Internet. These external entities may also assign an identifier to the same user for access or for identifying the user with these external services. These identifiers are considered external identifiers because they are not assigned by the entity, as contrasted with internal identifiers assigned by the entity. The same user may have many external identifiers, such as instant messenger identifiers (e.g., Yahoo messenger identifier (ID), MSN messenger ID, Google Chat ID, etc,), mail identifiers (e.g., Google Mail ID, etc), social networking identifiers (e.g., Facebook identifier), or other types of identifiers. Most of the time, these identifiers may be cryptic and may not be used by an entity to easily identify a particular user that is accessing or using the entities' computing resources or services.
Existing DLP technologies can detect violations of different DLP policies, and generate incident records for different events such as data leaks through instant messaging (IM) events, Universal Serial Bus (USB) data transfer events, file transfer events, electronic message (email) events, and printing events. However, in the case of the IM event, the messenger ID is reported with the incident. Similarly, in the case of the USB event, the operating system's user identifier is reported. These are both identifiers that can be cryptic and difficult to correlate with the user of the entity (e.g., employee of a corporation).
Existing security techniques fail to provide efficient solutions that can protect organizations in the situations described above. These existing DLP technologies do not have a way to correlate all the identities to a single user, and thus, when looking at an incident record, it is not easy to identify the user associated with that incident record. In addition, it is difficult to find all activities performed by a single user when the user uses different identities for different operations.