The present disclosure relates generally to language processing, and more specifically, to extraction of multiword lexical kernel units from a domain-specific lexical resource.
Parsers are a fundamental stepping stone to many different types of natural-language processing (NLP) applications and tasks. One such type of system that relies upon a parser is a question-answering computer system. Question-answering computer systems typically employ NLP that returns a highest scoring answer to a question. For NLP techniques (also referred to as text analytics) to infer aspects of the meaning of terms and phrases, the use of a parser to analyze syntax, context, and usage is a crucial requirement.
Human language is so complex, variable (there are many different ways to express the same meaning), and polysemous (the same word or phrase may mean many things in different contexts) that NLP presents an enormous technical challenge. Decades of research have led to many specialized techniques each operating on language at different levels and on different isolated aspects of the language understanding task. Many of these techniques fall under the umbrella of text analytics for information purposes, and the associated natural language processing toolkits typically have, as a central component, a parser.
In most cases, a parser requires a lexicon which tends to be partitioned into a base (domain-independent) part, and a domain-dependent part. However, even lexicons which are characterized by a large inventory of domain-specific terms do not contain all possible domain terms. This is due to several factors including the fact that commonly used terms and expressions in a domain are constantly evolving and changing, or that just a subset of a text corpus resource was utilized to generate the lexicon. When a parser encounters single words or multiword terms which were not previously considered when the lexicon was built, it will suffer from lexical look-up failure, and/or misguided parse analysis.