The present disclosure is generally directed to knowledge graphs and, more specifically to techniques for assigning confidence scores to relationship entries in a knowledge graph utilized in conjunction with data processing systems, such as a question answering systems and/or information retrieval systems.
Watson is a question answering (QA) system (i.e., a data processing system) that applies advanced natural language processing, information retrieval, knowledge representation, automated reasoning, and machine learning technologies to the field of open domain question answering. In general, conventional document search technology receives a keyword query and returns a list of documents, ranked in order of relevance to the query (often based on popularity and page ranking). In contrast, QA technology receives a question expressed in a natural language, seeks to understand the question in greater detail than document search technology, and returns a precise answer to the question.
The Watson system reportedly employs more than one-hundred different algorithms to analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses. The original Watson system implemented DeepQA™ software and the Apache™ unstructured information management architecture (UIMA) framework. Software for the original Watson system was written in various languages, including Java, C++, and Prolog, and runs on the SUSE™ Linux Enterprise Server 11 operating system using the Apache Hadoop™ framework to provide distributed computing. As is known, Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. The original Watson system employed DeepQA software to generate hypotheses, gather evidence (data), and analyze the gathered data. The original Watson system was workload optimized and integrated massively parallel POWER7® processors. The original Watson system included a cluster of ninety IBM Power 750 servers, each of which included a 3.5 GHz POWER7 eight core processor, with four threads per core. In total, the original Watson system had 2,880 POWER7 processor cores and 16 terabytes of random access memory (RAM). Reportedly, the Watson system can process 500 gigabytes, the equivalent of one million books, per second. Sources of information for the Watson system include encyclopedias, dictionaries, thesauri, newswire articles, and literary works. The Watson system also uses databases, taxonomies, and ontologies.
Cognitive systems learn and interact naturally with people to extend what either a human or a machine could do on their own. Cognitive systems help human experts make better decisions by penetrating the complexity of ‘Big Data’. Cognitive systems build knowledge and learn a domain (i.e., language and terminology, processes and preferred methods of interacting) over time. Unlike conventional expert systems, which have required rules to be hard coded into an expert system by a human expert, cognitive systems can process natural language and unstructured data and learn by experience, similar to how humans learn, by implementing trained machine models, pattern recognition, and machine learning algorithms. While cognitive systems have deep domain expertise, instead of replacing human experts, cognitive systems act as a decision support system to help human experts make better decisions based on the best available data in various areas (e.g., healthcare, finance, or customer service).