Advances in electronic and digital communication have increased the speed and efficiency with which information can be transmitted or shared. One issue that has arisen due to the speed and ease with which electronic information can be shared involves accidental leaks or distribution of personal or proprietary information to unauthorized or unintended recipients. Accidentally sharing or leaking personal or proprietary information can occur in corporate network environments, as well as in social networking environments.
In the context of a corporate network environment, employees can accidentally release or send proprietary information to other employees, contractors, or outside users who may not be authorized to view or possess such information. For example, a user might use an email client application to address and compose email messages. Such email clients often include auto-complete features to predict and complete an email address of a known contact when only a few letters of the name or address are entered into an address field. When such an email client incorrectly predicts an address, or a user inadvertently selects an address from one of several predicted email addresses, the email message can end up being addressed and sent to an unintended or unauthorized email recipient. In the social network environment context, users may unintentionally publish or share potentially embarrassing postings or personal content to other users of the social network who are not close friends or trusted contacts without knowing.
Various systems exist for determining potential leaks of sensitive information and then generating an appropriate warning or alert that informs the user of the potential leak or prevents the communication from happening altogether. Such systems typically classify information according to various levels of secrecy and designate users who are authorized to receive information at the various levels of secrecy. In such systems, it is possible to screen documents before they are shared; however, all documents may not be classified, such as newly created or authored documents. In addition, for many networks, such as social networks, there may be no existing systematic method for classifying documents and recipients.
Various solutions address these issues by implementing a document similarity test to prevent leaks. In such systems, users are warned when they are about to share documents or other information with recipients who apparently have never been privy to or included on communications involving similar information. Such systems also include various problems. For example, such systems only warn users about the leak of information without indicating which information in their communication might be sensitive.
Additionally, conventional systems also do not account for the scenario in which information shared by one user with another user is not necessarily sensitive. When information is about to be shared between two users who have not previously shared similar information, such information is not necessarily sensitive when both users already know about or possess such information. As such, conventional information control systems have a potentially high false alarm rate related to flagging “sensitive” information that may already be widely known across the network or organization. Thus, individuals who never communicate with one another about such content may already be aware of it due to its widespread distribution
One common underlying problem for various systems that are developed to address the unintentional or unauthorized sharing of sensitive information is that they require a large amount of human intervention to identify information that must be controlled. Accordingly, when new content is introduced into a network, a network administrator, or other user, responsible for or interested in managing information flow in the network would need to determine the sensitivity level of each content item, and then identify users who will be authorized to access, view, or receive such content. Such conventional systems and techniques are slow manual processes that are often overly restrictive or allow too much information to be leaked.