In the field of artificially intelligent computer systems capable of answering questions posed in natural language, cognitive question answering (QA) systems (such as the IBM Watson™ artificially intelligent computer system or and other natural language question answering systems) process questions posed in natural language to determine answers and associated confidence scores based on knowledge acquired by the QA system. In operation, users submit one or more questions through a front-end application user interface (UI) or application programming interface (API) to the QA system where the questions are processed to generate answers that are returned to the user(s). The QA system generates answers from an ingested knowledge base corpus, including publicly available information and/or proprietary information stored on one or more servers, Internet forums, message boards, or other online discussion sites. Using the ingested information, the QA system can formulate answers using artificial intelligence (AI) and natural language processing (NLP) techniques to provide answers with associated evidence and confidence measures. However, the quality of the answer depends on the ability of the QA system to identify and process information contained in the knowledge base corpus.
With some traditional QA systems, there are mechanisms provided for processing information in a knowledge base by using vectors to represent words to provide a distributed representation of the words in a language. Such mechanisms include “brute force” learning by various types of Neural Networks (NNs), learning by log-linear classifiers, or various matrix formulations. Lately, word2vec, that uses classifiers, has gained prominence as a machine learning technique which is used in the natural language processing and machine translation domains to produce vectors which capture syntactic as well semantic properties of words. Matrix based techniques that first extract a matrix from the text and then optimize a function over the matrix have recently achieved similar functionality to that of word2vec in producing vectors. However, there is no mechanism in place to identify and/or process concepts in an ingested corpus which are more than merely a sequence of words. Nor are traditional QA systems able to identify and process concept attributes in relation to other concept attributes or in relation to changes in the concept relationships over time. Nor do such systems provide any mechanism for dynamically generating concept-based content based on concepts of potential interest to the user. Instead, existing attempts to deal with concepts generate vector representations of words that carry various probability distributions derived from simple text in a corpus, and therefore provide only limited capabilities for content authoring applications, such as NLP parsing, identification of analogies, and machine translation. As a result, the existing solutions for efficiently identifying and applying concepts contained in a corpus are extremely difficult at a practical level.