1. Field of the Invention
The field of the invention relates to systems and methods for performing data classification operations.
2. Description of the Related Art
As modern enterprise environments trend towards a paperless workplace, electronic data is often created at a high rate. This electronic data takes a variety of forms which may include emails, documents, spreadsheets, images, databases, etc. Businesses have a need to effectively classify and organize all of this electronic data.
However, it can be extremely difficult to accurately classify large amounts of data in ways which are time and cost effective. Existing solutions have typically allowed a user to classify files in at least one of two ways. The user can manually view each file and determine the appropriate classification. While this can be a relatively accurate method of categorizing data, it quickly becomes expensive and impractical as the volume of data-to-be-classified increases.
Alternatively, files can be classified using an explicit set of rules defined by the user. For example, a data classification rule may be based on inclusion of a keyword or a small set of keywords. With this approach, the classification of files can be done by machine, but the use of explicit rules tends to be a relatively inaccurate method of classifying non-homogeneous files and can result in many false classifications.