The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for providing guidance to viewers of documents by performing deep document analysis on the documents and then using various user interface techniques to provide the user with active knowledge guidance as to semantic content related to the document that is being viewed.
With the increased usage of computing networks, such as the Internet, humans are currently inundated and overwhelmed with the amount of information available to them from various structured and unstructured sources. However, information gaps abound as users try to piece together what they can find that they believe to be relevant during searches for information on various subjects. To assist with such searches, recent research has been directed to generating knowledge management systems which may take an input, analyze it, and return results indicative of the most probable results to the input. Knowledge management systems provide automated mechanisms for searching through a knowledge base with numerous sources of content, e.g., electronic documents, and analyze them with regard to an input to determine a result and a confidence measure as to how accurate the result is in relation to the input.
One such knowledge management system is the IBM Watson™ system available from International Business Machines (IBM) Corporation of Armonk, N.Y. The IBM Watson™ system is an application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of open domain question answering. The IBM Watson™ system is built on IBM's DeepQA technology used for hypothesis generation, massive evidence gathering, analysis, and scoring. DeepQA takes an input question, analyzes it, decomposes the question into constituent parts, generates one or more hypothesis based on the decomposed question and results of a primary search of answer sources, performs hypothesis and evidence scoring based on a retrieval of evidence from evidence sources, performs synthesis of the one or more hypothesis, and based on trained models, performs a final merging and ranking to output an answer to the input question along with a confidence measure.
Historically there have been two kinds of systems—structured data-based and document-based. Traditional enterprise user interfaces presents an interface to linked, structured data; accordingly, screens and navigation between screens reflect the model and content of these objects. These systems present data in a manner designed to maximize informativeness and ease of use. Accordingly, information is judiciously selected with information density being high while ease of use is also high. However, such traditional systems are closed-end systems and are limited to the available structured content. In contrast, document-based systems present more shallow information density, but can be more open-ended and also provide access to more information than structured data interfaces. Traditional document-based systems generally provide inferior user experience when compared with their structured data counterparts. In traditional document-based interfaces, information density is generally lower and ease of use is also lower than with structured data interfaces.