Field of the Invention
The present invention generally relates to a method and system for prioritizing content for crowdsourcing. In particular, the invention relates to content to be translated by crowdsourcing.
Description of the Related Art
The translation of content (web pages, formatted documents, text files, etc.) includes steps of preprocessing such as extraction of text, segmentation, which produces collection of text segments in the source language; translation such as the extracted segments are passed to either human translators or to a Machine Translation (MT) server; and aggregation such as the translated segments are aggregated to create a final translated content.
A Translation Memory (TM), a collection of text segments in the source language with corresponding human translation of the segments in the target language, is frequently used to reduce the cost of translation. The segments passed to a human translator may be accompanied by MT result to assist in the translation task (presumably correcting existing translation is less costly than creating a new one from scratch). Due to the high cost of human translation, crowdsourcing has been used to replace professional translation services.