1. Technical Field
Present invention embodiments relate to knowledge discovery in structured data, and more specifically, to discovery of implicit relations in structured knowledge bases.
2. Discussion of the Related Art
Relational knowledge is important content for many tasks. Information Extraction, Question Answering and Knowledge Discovery are applications for which relational knowledge is essential. Examples of information needs for relational knowledge include:
“Who directed the movie Jaws?”
{(Jaws_(film) director ?d).}
“What conferences are in New York?”
{(?c a Conference).
(?c location New_York_City).}
“What drugs treat anemia?”
{(?d a Drug).
(Anemia may_be_treated_by ?d).}
To satisfy such information needs, intelligent systems typically rely on large knowledge repositories relevant to the particular domain of discourse. The knowledge can be acquired through Natural Language Processing (NLP) techniques over large corpora. It can also be curated by domain communities. Both forms of Knowledge Capture have limitations that affect the performance on target tasks.
Corpus-based techniques can produce noisy knowledge graphs whose semantic granularity matches the textual expression of relational knowledge, but may not match what is required of a given task. For example, a text may mention a “test for splenic fever”, but a particular task (that is dependent on a knowledge base, for example) may require a more fine-grained representation, such as:
(Test has_component b.anthracis)
(b.anthracis causative_agent_of splenic fever)
The most glaring weakness of curated knowledge bases, on the other hand, has traditionally been poor coverage.
Accordingly, those interested in exploiting curated knowledge have long believed that if only enough structured domain knowledge was available, these repositories could be used directly to solve the information needs of our intelligent systems. Relational knowledge, for example, could simply be “looked up” in the knowledge base.
In several domains, such as biomedicine, large, curated knowledge repositories are now available. Yet, such repositories are still inadequate for many of the intelligent tasks for which we would use them. In many cases, they seem just as noisy as automatically extracted sources, the grain of their representations is just as inappropriate, and the semantics are just as vague.