This invention relates generally to methods to align and reconcile documents in a cognitive computing system corpus of for which differences and contradictions have been found amongst other documents and information elements in the corpus.
Approximately 80% of online and digital data today is “unstructured data”, such as news articles, research reports, social media posts, and enterprise system data. Unlike “structured data”, e.g., databases, configuration tables, etc., which is readily useable by traditional computing processing, unstructured data is not directly compatible with traditional computer processes.
Understanding and interpreting unstructured data, such as electronic documents expressed in Natural Language (NL), is beyond the capacities of traditional search engines. Traditional search engines find keywords, and rank their findings according to the number of appearances of each keyword and their proximities to each other. In order to effectively use a keyword-based search engine, a user must input the most effective keywords. But, if the user does not know the correct keywords, the search engine may be of little use.
Further, keyword-based search engines have no ability to assign “right” or “wrong” to their results because they do not interpret their findings, and thus cannot detect disagreements between two or more search findings. For example, if a user is searching for a likely cause of a particular abdominal malady, he or she may input the symptoms (abdominal pain, nausea, etc.) as keywords into a keyword-based search engine. Two documents may be found by the search engine, each of which has similar quantities of appearances of the keywords (references to the symptoms) and thus are ranked similar to each other. However, the documents may depart radically from each other in their explanations of the potential cause (allergy, food poisoning, cancer, etc.) of the symptoms. The user must now try to make sense of these documents and determine which is correct, if either.