Enterprises are under variety of business rules for document management and retention. Vast transactions accumulate great amounts of documents in modern business environments. Documents are usually kept in document management systems. Document management systems apply multitude of business rules to maintain documents in order to comply with the business requirements. Privacy rules are applied across organizations and their documents. Document retention rules are applied to comply with company policies and legal responsibilities. Accounting analysis systems are integrated to the document management solutions to analyze transaction processes. In addition, marketing solutions may be integrated to document management systems to extract customer and business information.
Document classification is at the core of data loss prevention (DLP) technologies. In diverse environments, business rules require entities to secure their document systems. In sensitive environments, identification of whether a document contains sensitive information may be needed in order to take an action to secure it (e.g. encrypt, audit or delete). Modern document classification approaches attempt to solve three key issues: false positive rates, false negative rates, and optimization. Modern optimization techniques utilize solutions based on testing or customer feedback. Existing solutions rely on “weight based” systems for document classification. Weight based systems are accumulative in nature and cannot be easily optimized over time.