Conventional machine translation typically uses linguistic rules to translate source text from one language to another. Machine translation also typically uses corpus and/or statistic techniques to recognize idioms, phrases and sentences, and find their closest counterparts in the target language. A machine translation engine will often allow for customization by domain or industry (such as, for example, IT industry, US government, criminal document) to improve the output by limiting the scope of allowable substitutions.
In the social networking field, social content (such as a message post in a blog or forum, a comment, tag, or short description) typically has the following characteristics: (1) usually not in formal written format, may be spoken style; (2) usually short, may not be a complete sentence or paragraph; (3) may utilize incorrect grammar; (4) semantics not always consistent, for instance, the subject or topics may change suddenly.
Even in view of these characteristics, the meaning of a given social content can often be well determined by a person in a social network because the given social content usually has additional contextual information, such as message threads, subject, topic, description, tags, user comments/replies or even two users' common backgrounds.
However, when a given social content is sent to a machine translation engine for translation, most of the descriptive context from the social network is typically lost. Therefore, such social content may be subject to ambiguity during translation, and lead to poor translation quality.