The present disclosure relates to language translation systems and more particularly to a smart terminology marker system of the language translation system.
Companies typically develop written material such as web pages, user interfaces, marketing materials and others in a native language and subsequently employ a language translation service to translate the company's web pages (as one example) into different languages. Language translation services may utilize a translation supply chain (TSC) that may include an integration of linguistic assets/corpuses, translation automated systems, computer-aided translation editors, professional linguists, and operational management systems.
The TSC may include three stages. The first stage may be a linguistic asset optimization stage that may parse source language content into source segments, and search a repository of historical linguistic assets for the best suggested translations per language and per a domain within the language. Linguistic assets may be historical translation memories (i.e., bi-lingual segment databases), dictionaries, and/or language specific metadata to optimize downstream stages. The second stage of the TSC may be a machine translation stage that customizes a translation model using domain specific linguistic assets of a given language, and provides machine generated suggested translations of original content based upon the customized translation model. The third stage may be a post-editing stage that may use a computer-aided translation (CAT) editor to review the suggested translations (i.e., called matches) to produce a final translation. The professional linguist (i.e., human) may accept one of the suggested matching translations, may modify one of the suggested matching translations, or may generate a completely new translation and delivers final human fluent translated content to the company.
Machine translation systems typically implement phased-based translations that have limited sensitivity to morphological, syntactical and/or semantic differences between the source and target languages. The process of customizing (i.e., training) a phased-based statistical machine translation system is common where bilingual corpuses are used to prioritize the statistical hits of correct translations within the statistical machine translation, phased-based, translation. Rule based machine translation is customized by managing a lexicon of terms aligned to a subject area. Terminology assets refer to the set of dictionaries/databases per language that may have the following properties: highly structured information; morphological, syntactical, and semantic information; and, enterprise international business metadata. Improvements in the overall quality of the translations on a consistent basis is desirable.