The present embodiments relate to identification and processing of idioms in a natural language processing environment. More specifically, the embodiments relate to monitoring electronic communications and utilizing machine learning to identify idioms and explanations of the idioms in order to develop a continuously expanding phrase dictionary for language translation.
In the field of artificially intelligent computer systems, natural language systems (such as the IBM Watson™ artificially intelligent computer system or and other natural language question answering systems) process natural language based on knowledge acquired by the system. To process colloquial language, which is often full of idioms, the system often performs a literal translation. The resulting outcome can be incorrect or inaccurate for a variety of reasons relating to peculiarities of language constructs and/or cultural differences.
Current language translation algorithms fall short of properly translating language idioms, such as expressions peculiar to a given language, regional speech or dialect, specialized vocabulary understood by a specific group of people, or jargon. Language idioms are particularly pervasive in the realm of social media due to the informal nature of such communication. Particularly, text messages sent through social media devices usually contain short-hand phrases, jargon, and expressions peculiar to the language or even the geographical area. To complicate matters even further, text messages are free form and typing mistakes are prevalent causing inaccurate translation(s).