The degree of sensitivity of a document is the degree to which it will or is assumed to hurt the organization if the information contained therein becomes accessible to non-authorized parties. This degree of sensitivity is in many organizations provided by documents being explicitly classified, but a document may be sensitive also without such classification.
Organizations are concerned with internal information is becoming available to people or entities outside the group trusted with the information. This may take many forms, one of which being when someone internal to the organization wilfully or by accident, sends classified documents out across the organization perimeter using an electronic mail system. This process is called “out-trusion”. Outtrusion can be very costly to an organization, since company secrets, intellectual property, or operations status information may lead to loss of money, trust, or ability to execute. There is also an increasing amount of legislation enforcing procedures that companies must adhere to.
To detect and/or prevent outtrusion the mail system transporting mail across the organization perimeter will often use a “compliance monitoring” system, whose purpose it is to detect when someone is not compliant with the policies of the organization. Such policies will typically define classes of documents, and describe how each such class should be treated. An example would be to state that patent applications must not be sent outside the company.
In order to detect when a policy is being violated, the compliance monitoring system must be able to detect how sensitive a document is. This is today typically done by one of the following methods:                Explicit identification of all classified documents. In this case, there exists an inventory of documents that must be treated in a special way.        Form-based identification. In this case, the system would for instance look for the word “confidential” in given locations in the document, or documents based on a specified document template.        Content-based identification. In this case, the document would be analyzed and matched towards a dictionary of words indicative of sensitivity, or perform a more advanced matching towards a taxonomy of classified documents.        
All these methods have weaknesses.                Explicit and form-based identification both require a process to be in place, and is costly due to the manual work involved. The process is also prone to error, based on misclassification or failure to comply with the process. Also, if the content is taken out of the document and copied into another, these detection methods will not be accurate.        Content-based identification is hard, since the difference between a very sensitive document and a public one may not be easy to spot automatically. Financial statements, for instance, are very sensitive before they are published and public knowledge afterwards. No linguistic process based on the content would spot that difference, short of looking for the publication date and comparing to the time of sending (which would be a variant of form-based identification).        
A prior art illustrative example of an explicit identification has been disclosed in U.S. Pat. No. 6,898,636 B1 (Adams & al.), where intended recipients of documents are selected and provided with an identifier which need not be more than an email address, and where a security designation then is added to the identifier. Documents which are to be sent, are also provided with a corresponding security designation and uploaded to a server which notifies the recipient with the same security designation as the documents concerned such that the latter can be downloaded. If a request for downloading is received, the documents may be encrypted and sent to the recipient.
Further US published application No. 2007/0261099 A1 (Broussard & al.) discloses how a search engine is used for security compliance and identifying confidential content and security violations with regard to this content on a document level. The results can be reported to a user and corrective actions proposed. For instance encryption may be undertaken. If the documents are to be sent as electronic mail, the list of recipients can be modified automatically using a security mechanism in accordance with the proposed corrective actions and on the same basis changes may be made to the content of the document.