The so-called machine translation utilizes computer programs for automatic language translation. In general, machine translation is implemented using computer programs to achieve translation from one language to another language. Conventional machine translation programs include a large amount of translation rules that are defined by human beings. These rules are rewritten into computer programs to implement machine translation. Rule-based machine translation has high translation quality, high costs, and low rule coverage, and the translation results are ambiguous in some cases. As computer processing power continues to increase, translation training sub-models (including translation rule tables, language models, and reordering models etc.) are trained using a large scale bilingual corpus, and are used to determine ideal translation text based on scores of these sub-models. This machine translation is also called statistical machine translation. In general, statistical machine translation is a type of machine translation and has better performance as compared to other machine translation types in a non-limiting field. The basic idea of statistical machine translation is to perform statistical analysis on a large number of parallel corpus, build statistical translation models, and perform translation using the models. Statistical machine translation models include a translation model based on words, a translation model based on phrases and a translation model based on the syntax. Among them, the most widely used model is a translation model based on phrases (or hierarchy phrases).
With respect to all existing statistical machine translation processes, the design principle and the premise are based on a specified source language and a target language. Also, this is the basis of all statistical machine translation processes. In other words, each statistical machine translation engine can only handle translate between a language pair, such as French to English or Chinese to English.
Since many English scenarios involve translation of a large number of different language pairs, such as a website includes multiple language sub-sites and country sub-sites, when a user inputs a query, a computing device may first identify the language of the query and then translate the query into English. The computing device may perform searches using the query in English. This involves translation tasks from dozens of languages to English. Accordingly, the service provider has to develop dozens of translation engines. This results in high development costs and consumes more computing power.
In addition, language identification accuracy has a direct impact on whether user's search intent is truly reflected in search results. Based on the current translation engines, language identification is essential and therefore the language identification using conventional techniques easily introduces errors to the whole querying process.