The Internet has made it possible for people to connect and share information globally in ways previously undreamt of. Social media platforms, for example, enable people on opposite sides of the world to collaborate on ideas, discuss current events, or simply share what they had for lunch. The amount of content generated through social media technologies is staggering. It is common for social media providers to operate databases with petabytes of media items, while leading providers are already looking toward technology to handle exabytes of data. Media items at least partially containing natural language (“language snippets”) are subject to some human error. While at times language snippet authors correct these errors as they enter them, often these errors are only identified by an automated system or remain uncorrected.
Errors have been a particularly prevalent problem for machine translations of language snippets. Machine translation engines enable a user to select or provide a source content item (e.g., a message from an acquaintance) in one natural language (e.g., Spanish) and quickly receive a translation of the content item in a different natural language (e.g., English). Machine translation engines can be created using training data that includes identical or similar content in two or more languages. However, the effectiveness of these machine translation engines can be significantly reduced when the source content item contains errors.
The techniques introduced here may be better understood by referring to the following Detailed Description in conjunction with the accompanying drawings, in which like reference numerals indicate identical or functionally similar elements.