The present disclosure relates to collecting relational training data to train a relation-based classifier by extracting document segments corresponding to multi-entity co-occurrence anomalies within source documents.
A question answer system answers questions posed in a natural language format by applying advanced natural language processing, information retrieval, knowledge representation, automated reasoning, and machine-learning technologies. Question answer systems differ from typical document search technologies because document search technologies return a list of documents ranked in order of relevance to a word query, whereas question answer systems receive a question expressed in a natural language, seeks to understand the question in much greater detail, and returns a precise answer to the question.
Question answer systems may perform relations extractions during the process of answering a question. A relations extraction system parses sentences into subject-verb-object (SVO) form and then may add additional semantic information such as entity extraction, keyword extraction, sentiment analysis and location identification. Relation extraction systems may also be used to automatically identify buying signals, key events and other actions important to a user.
Prior to using relations extraction systems, classifiers within the relations extractions systems require training. The classifiers train on training data that eventually allows the classifiers to determine “yes” answers from “no” answers during real-time use.