The present invention relates to semantic graphs, and more specifically, to semantic graph augmentation for domain adaptation.
Graphs of semantic concepts and relationships derived from structured knowledge are an extremely valuable resource for high-precision natural language processing (NLP) systems. However, they cannot possibly encode all human knowledge in a given domain, and are therefore susceptible to gaps in available semantic structure that can reduce accuracy in unseen contexts.
Existing techniques to resolve this problem can involve, for example, a) flexible weighting strategies for graph activation, b) selective curation of knowledge that is included in the graph, or c) the use of additional techniques and resources to assist in specific problematic scenarios, like parts of speech (POS) filtering.
Flexible weighting strategies for graph activation allow a graph to be used in domains and contexts that were not originally intended, by changing how the signal can move through the graph due to the weighted relevance of particular categories of semantic data. This relies heavily on existing graph structure. There will always be contexts for which supplementary knowledge is the only possible solution. However, adding supplementary knowledge usually requires intervention from domain experts, which can be extremely expensive.
Selective curation of knowledge that is included in the graph is an error prone process that involves pruning the graph for apparently irrelevant content, in order to facilitate better connectivity among concepts that are present in the data set. For this method, the result is less reflective of the original semantics of the intended domain, so the conclusions drawn from the graph are of questionable quality. In addition, although it may apparently fix certain performance or contextual issues, the potential coverage of the graph for new scenarios and unseen text is greatly reduced.
The use of additional techniques and resources to assist in specific problematic scenarios, like part of speech (POS) filtering, can often help to fix issues of contextual ambiguity. Activating the graph with irrelevant concepts, can skew the resulting output significantly. However, in highly connected domains like medical literature, it is difficult to know when a particular concept is relevant. For example, the word “was” is a Chemical Compound in the UMLS knowledge base. Applying a POS-tagger to the text, so that concepts that exhibit incompatible values (such as past-tense verb instead of noun), can help a lot. However, the POS-tagger must be of high quality, and ambiguity on the same POS tag is not resolved here.