The present invention relates to latent ambiguity handling, and more specifically, to latent ambiguity handling in natural language processing.
Ambiguous language is a problem in natural language processing (NLP) systems. Words may be unambiguous at their face value, but may have different meanings, subtleties, or sub-groups that affect their meaning.
For example, if it is known that a document's content is referring to the topic of Education, it is known that the word “school” is more likely referring to the sense of an “educational institution” rather than the meaning of a “group of fish”. This is where traditional word-sense disambiguation processes would stop, satisfied that the job is finished. However, a lot of ambiguity still remains. Is it a secondary or primary school, or is it even a school in Ireland? The word “school” is inherently ambiguous because it does not provide enough information. Perhaps the school in question is actually for training dogs rather than people.
The term “latent natural language ambiguity” is used herein to describe this phenomenon, borrowing from the legal definition of “latent ambiguity” where the wording of an instrument is on the face of it clear and intelligible, but may, at the same time, apply equally to two different things or subject matters. Latent natural language ambiguity is defined as instances where the sense of a word may appear to be clear and intelligible but at the same time, may apply equally to any number of alternative senses.
When referring to the meaning of “school” as an educational institution, the precise characteristics of that institution are not defined in this general sense. Only by anaphora resolution in the text, if a more fine-grained semantic meaning exists there, can the true contextual meaning of the word school be made apparent. However, such co-reference resolution techniques focus on the immediate context of documents, paragraphs, and sentences. What if the true meaning of “school” is something dependent on a larger context, such as in the query “Which school should I send my 10 year old daughter to?” The simple realization that a primary school is needed, or a school suitable for 10 year old girls, is extremely valuable information for any complex NLP system, such as a question answer or search system.