With the rapid increase and advances in digital documentation services and document management systems, organizations are increasingly storing important, confidential, and secure information in the form of digital documents. Unauthorized dissemination of this information, either by accident or by wanton means, presents serious security risks to these organizations. Therefore, it is imperative for the organizations to protect such secure information and detect and react to any secure information (or derivatives thereof) from being disclosed beyond the perimeters of the organization.
Additionally, the organizations face the challenge of categorizing and maintaining the large corpus of digital information across potentially thousands of data stores, content management systems, end-user desktops, etc. One solution to this challenge is to generate fingerprints from all of the digital information that the organization seeks to protect. These fingerprints tersely and securely represent the organization's secure data, and can be maintained in a database for later verification against the information that a user desires to disclose. When the user wishes to disclose any information outside of the organization, fingerprints are generated for the user's information, and these fingerprints are compared against the fingerprints stored in the fingerprint database. If the fingerprints of the user's information matches with fingerprints contained in the fingerprint server, suitable security actions are performed.
However, the user has at his disposal myriad options to disclose the information outside of the organization's protected environment. For example, the user could copy the digital information from his computer to a removable storage medium (e.g., a floppy drive, a USB storage device, etc.), or the user could email the information from his computer through the organization's email server, or the user could print out the information by sending a print request through the organization's print server, etc. Therefore, it is imperative to monitor the user's activity through each of these egress points.
In order to effectively protect the organization's secure information, the information that is transmitted through any of the organization's egress points needs to be converted to fingerprints and compared against the fingerprints contained in the organization's fingerprint database. One way of achieving this would be by replicating and maintaining a plurality of fingerprint databases at the locations containing egress points (e.g., at the print server, at the email server, at the user's desktop computer, etc.). This can be achieved by means of database replication, agent polling, diff sync pushes from a central fingerprint server, etc. Another way of doing this would be by maintaining a remote server containing the fingerprint database and querying this remote server utilizing the network every time a user's input information needs to be verified.
However, both these fingerprinting solutions suffer several disadvantages. In the case of maintaining local fingerprint databases at every egress point, the cost and inefficiency of maintaining large-memory databases becomes prohibitively large. This is especially true in organizations that maintain hundreds (or thousands) of systems that function as egress points. Here, with the increase in the number of egress points, the number of individual fingerprint databases that need to be created, maintained, and refreshed periodically becomes excessively large. In addition, the fingerprints in the fingerprint database invariably contain additional metadata (e.g., to indicate the location of the fingerprint within a document, to indicate the origin information of the document, etc.), further increasing the size of the individual fingerprint databases, thus further exacerbating the cost and difficulties associated with maintaining a plethora of individual fingerprint databases.
On the other hand, the case of maintaining one or more remote fingerprint servers serving protect agents by communicating over the network presents a different set of problems. One problem is the offline scenario when a user is not connected to the network. In this case, the egress point cannot be monitored because fingerprint lookups cannot be done without the network. Therefore, the user is either prohibited from accessing the data, or the data is susceptible to unauthorized disclosure. Other problems with this method includes scalability and user experience. Specifically, making a remote request to the fingerprint servers will inevitably introduce latency on the egress point being protected. Also, as the number of protect agents increase, an increasing number of fingerprint servers will be needed to handle the increasing load, thus further affecting the latency and increasing the cost of performing fingerprint lookups.
Other solutions exist in the prior art to protect digital information in such porous environments. These solutions include encrypting the files, or applying digital rights management or watermarks directly to the files. These solutions do not typically employ the method of fingerprint lookups, and therefore do not require fingerprint databases to be maintained. However, they present other disadvantages. For example, the digital information itself needs to be converted, and unprotected versions of the information needs to be identified and managed (or destroyed) to ensure the security of the information. Additionally, the presence of the watermarking or the digital rights management information does not preclude the information from being disclosed outside of the organization. In most cases, the watermarks only serve as a security awareness or deterrent feature and do not actually prevent the information from being disclosed.