The present invention generally relates to natural language processing. More particularly, the present invention relates to natural language processing including synonymous collocations. A collocation refers to a lexically restricted word pair with a certain syntactic relation that can take the form: <head, relation-type, modifier>. For instance, a collocation such as <turn on, OBJ, light> is a collocation with a verb-object syntactic relation. Collocations are useful in helping to capture the meaning of a sentence or text, which can include providing alternative expressions for similar ideas or thoughts.
A synonymous collocation pair refers to a pair of collocations that are similar in meaning, but not identical in wording. For example, <turn on, OBJ, light> and <switch on, OBJ, light> are considered synonymous collocation pairs due to their similar meanings. Generally, synonymous collocations are an extension of synonymous expressions, which include synonymous words, phrases and sentence patterns.
In natural language processing, synonymous collocations are useful in applications such as information retrieval, language generation such as in computer-assisted authoring or writing assistance, and machine translation, to name just a few. For example, the phrase “buy book” extracted from user's query should also match “order book” indexed in the documents. In language generation, synonymous collocations are useful in providing alternate expressions with similar meanings. In the bilingual context, synonymous collocations can be useful in machine translation or machine-assisted translation by translating a collocation in one language to a synonymous collocation pair in a second language.
Therefore, information relating to synonymous expressions and collocations is considered important in the context of natural language processing. Attempts have been made to extract synonymous words from monolingual corpora that have relied on context words to develop synonyms of a particular word. However, these methods have produced errors because many word pairs are generated that are similar but not synonymous. For example, such methods have generated word pairs such as “cat” and “dog” which are similar but not synonymous.
Other work has addressed extraction of synonymous words and/or patterns from bilingual corpora. However, these methods are limited to extracting synonymous expressions actually found in bilingual corpora. Although these methods are relatively accurate, the coverage of the extracted expressions has been quite low due to the relative unavailability of bilingual corpora.
Accordingly, there is a need for improving techniques of extracting synonymous collocations particularly with respect to improving coverage without sacrificing accuracy.