Document security is a generic term that describes protecting documents from unauthorized users. Traditionally, a document creator may password protect a document as a simple approach to prevent unauthorized viewing. Using this traditional approach, a user may enter the correct password and view the entire document. Otherwise, the user is prohibited from viewing any portion of the document.
Information retrieval and question answering systems ingest documents from many sources to create a knowledge base from which to obtain results. The documents may have varying levels of classification depending upon the domain of the knowledge base. For example, a corporate or military knowledge base may include confidential, secret, and top secret documents. In another example, a medical knowledge base may include medical documents with sensitive patient information such as social security numbers, insurance information, etc.
Information retrieval and question answer systems add annotations to documents as a way to incorporate metadata, entity information, or additional knowledge into searches to improve information recall and answering precision. Software developers may link annotations to documents using a variety of approaches, such as by storing the annotations as metadata at a document level, storing the annotations in separate structured resources, or modifying the document by embedding the annotations directly into the document.