1. Field of the Invention
The present invention relates generally to controlling access to an electronic document that has unstructured data, and more specifically to classifying the electronic document based on information within the electronic document, associated structured data, and a particular end-user that is attempting to access the electronic document.
2. Description of the Related Art
People within organizations such as banks, hospitals, and large companies often have access to various electronic documents that contain unstructured data. The unstructured data may include personal addresses, dates, social security numbers, credit card numbers, and other sensitive or non-sensitive information. Moreover, an electronic document containing sensitive information can pass through a workflow management system or can be stored in a repository, wherein multiple people having different roles obtain access to the electronic document at different times. Therefore, an electronic document containing sensitive information can be vulnerable to unauthorized use if access to the electronic document is not properly controlled.
It is known to protect sensitive information within an electronic document by redacting part of a document or preventing complete access to the document. For example, a computer program can search a document and utilize a regular expression to identify sensitive information having an expected pattern that corresponds to a person's social security number, medical history, and/or salary. Subsequently, the sensitive information that was identified, using the regular expression, can be redacted.
It is also known to utilize an electronic dictionary having a array of sensitive words to identify information within an electronic document that is potentially sensitive. Particularly, a program can utilize the electronic dictionary to perform comparisons that can identify information within the electronic document that matches with at least one of the sensitive words in the array. Subsequently, the information identified can be redacted.
However, utilizing a regular expression and/or an electronic dictionary to identify sensitive information and subsequently redacting the sensitive information within the electronic document is not sufficient for satisfying the needs of all end-users, because certain end-users may need access to the information that is redacted even though other end-users should not have access to the information for security reasons. Accordingly, it is desirable to classify an electronic document in order to selectively control access to the electronic document based on a particular end-user attempting to access the electronic document.