The invention relates to systems and methods for computer processing of natural languages, and in particular to systems and methods for automatic word sense disambiguation.
Interest in natural language processing (NLP) has been steadily increasing in recent years. Globalization and the widespread use of the Internet are driving the development of automated translation technology. The popularity of mobile and wearable computing devices, coupled to the progress of artificial intelligence and software engineering, are fueling growth in the area of human-machine interfaces, such as speech and handwriting recognition, among others.
Automated language processing has long been considered difficult because of the diversity, inherent ambiguity, context-sensitivity, and redundancy of human language. A particular task is word sense disambiguation (WSD), comprising automatically determining a sense or meaning of a word in the context of a natural language communication.
Common language-processing applications use computer-readable linguistic knowledge bases (LKB) containing information on the lexicon and grammar of a natural language. Some LKBs also include semantic information, which may be used for WSD applications. Creation of such knowledge bases typically involves dictionary-based and corpus-based methods. Dictionary-based methods may comprise assembling a lexicon and manually or semi-automatically annotating lexicon entries with various linguistic and/or semantic information. Corpus-based methods often employ statistical data gathered from various corpora of natural language text to automatically determine linguistic and/or semantic relationships between lexicon entries.