Paraphrases, which are phrases that are formally distinct but have substantially similar semantic meanings, are a major challenge for machine processing of natural language, in fields ranging from search and information retrieval to machine translation. In an example pertaining to search, a user may enter a query to search for particular documents. The search engine, however, may not retrieve a document desired by the user unless the document includes identical terms to those included in the query. Accordingly, it is particularly important that the search engine be able to identify paraphrases of queries, thereby enabling improved information retrieval.
In another example, paraphrases can be utilized in connection with interpreting voice commands. A verbal instruction of “turn the machine off” is a paraphrase of “power down the machine” and “switch off the power”. A computerized entity can desirably understand that such three phrases are paraphrases of one another, and are equivalent for the task of turning the power off of a certain machine.
Automatically learning which phrases are paraphrases of one another is a difficult problem, due at least in part to the complexities of human language. For example, entirely different words or phrases may mean the same thing in certain contexts, while substantially similar words or phrases may have entirely different meanings in certain contexts. Manually attempting to label all paraphrases in the English language, however, is such a monumental task that it is unrealistic.