The invention relates to methods and systems for computer processing of natural languages.
Commercial interest in computer-based human language processing has been steadily increasing in recent years. Globalization and the widespread use of the Internet are driving the development of automated translation technology, while progress in robotics and software engineering is fueling growth in the area of human-machine interfaces and automated document processing.
Language processing applications often use computer-readable linguistic knowledge bases (LKB) containing information on the grammar, lexicon, and structure of natural languages, as well as on lexical correspondences and other relations between natural languages.
Automated translation has long been considered a difficult task, in part because of the diversity and redundancy of human language. For example, some phrases that express the same notion or message in different languages may not share equivalent words or syntactic structure. Moreover, natural language is often context-sensitive. The meaning of some phrases may be essentially different from that conveyed by their individual words.
Common approaches to natural language processing include dictionary-based, example-based, and corpus-based methods. Dictionary work typically involves the creation of lexical knowledge bases. Example-based methods aim to create large collections of example phrases, and to match incoming text to the stored examples. Corpus-based work often employs statistical models of relationships between words and other linguistic features.