With the rapid increase and advances in digital documentation services and document management systems, organizations are increasingly storing important, confidential, and secure information in the form of digital documents. Unauthorized dissemination of this information, either by accident or by wanton means, presents serious security risks to these organizations. Therefore, it is imperative for the organizations to protect such secure information and detect and react to any secure information (or derivatives thereof) from being disclosed beyond the perimeters of the organization.
Additionally, the organizations face the challenge of categorizing and maintaining the large corpus of digital information across potentially thousands of data stores, content management systems, end-user desktops, etc. One solution to this challenge is to generate fingerprints from all of the digital information that the organization seeks to protect. These fingerprints tersely and securely represent the organization's secure data, and can be maintained in a database for later verification against the information that a user desires to disclose. When the user wishes to disclose any information outside of the organization, fingerprints are generated for the user's information, and these fingerprints are compared against the fingerprints stored in the fingerprint database. If the fingerprints of the user's information matches with fingerprints contained in the fingerprint server, suitable security actions are performed.
However, the user has at his disposal myriad options to disclose the information outside of the organization's protected environment. For example, the user could copy the digital information from his computer to a removable storage medium (e.g., a floppy drive, a USB storage device, etc.), or the user could email the information from his computer through the organization's email server, or the user could print out the information by sending a print request through the organization's print server, etc. Therefore, it is imperative to monitor the user's activity through each of these egress points.
In order to effectively protect the organization's secure information, the information that is transmitted through any of the organization's egress points needs to be converted to fingerprints and compared against the fingerprints contained in the organization's fingerprint database. One way of achieving this would be by replicating and maintaining a plurality of fingerprint database at the locations containing egress points (e.g., at the print server, at the email server, at the user's desktop computer, etc.). This can be achieved by means of database replication, agent polling, diff sync pushes from a central fingerprint server, etc.
However, most organizations have several desktop computers and maintain arrays of systems that represent a large number of egress points. With the increase in the number of egress points, the number of individual fingerprint databases that need to be created, maintained, and refreshed periodically becomes prohibitively large. In addition, the fingerprints in the fingerprint database may also contain additional metadata (e.g., to indicate the location of the fingerprint within a document, to indicate the origin information of the document, etc.), further increasing the size of the individual fingerprint databases, thus further exacerbating the cost and difficulties associated with maintaining a plethora of individual fingerprint databases.
Other solutions exist in the prior art to protect digital information in such porous environments. These solutions include encrypting the files, or applying digital rights management or watermarks directly to the files. These solutions do not typically employ the method of fingerprint lookups, and therefore do not require fingerprint databases to be maintained. However, they present other disadvantages. For example, the digital information itself needs to be converted, and unprotected versions of the information needs to be identified and managed (or destroyed) to ensure the security of the information. Additionally, the presence of the watermarking or the digital rights management information does not preclude the information from being disclosed outside of the organization. In most cases, the watermarks only serve as a security awareness or deterrent feature and do not actually prevent the information from being disclosed.