The present invention relates to use of statistical data for machine translations in computer networks when translation is needed between different languages.
Translations are becoming increasingly important as the Internet and other computer networks cross international borders and provide access to a wide variety of documents written in different languages. Commercial, scientific, engineering, political, artistic, and other types of human interaction often require translation. Human translators cannot keep up with this demand, and machine translation is becoming prevalent. Machine translation (MT) is produced by a computer as it executes computer instructions. Naively speaking, an MT system should imitate the work of a human translator who understands the source language and expresses the same understanding in the target language. However, human understanding and human expression cannot be captured by computers: while both humans and computers can consult dictionaries and grammatical rules, humans can translate even without knowing grammar, and human ability to understand each other even without speaking enhances their translation abilities in ways unmatched by computers which do not understand anything but just follow instructions. On the other hand, computer speeds are unmatched by humans, and neither is the computer ability to store vast amounts of data which can be recalled systematically without a hint or cue. Consequently, machine translation has developed as a field with its own lexical and statistical techniques designed to meet the usual engineering concerns such as minimizing the use of computer resources (memory, processing power, network bandwidth, etc.) while providing adequate speed and low cost.
FIG. 1 illustrates a computer network with a server 110 that performs machine translations in response to requests received over a network 130 (e.g. the Internet) from computers 120. Server 110 can be a single computer or a distributed system including multiple computers interconnected by a variety of networks possibly including the network 130. A request from a computer 120 may be an explicit request to translate a document, or may be a request to perform some other task requiring translation, for example, to perform a search of documents in different languages. Thus, a user of computer 120 may submit a search query in one language, but the query must be matched against documents written in another language, so the search query has to be translated before the search can proceed.
Server 110 has access to computer databases 140 storing the documents to be searched. Machine translation engine 160 translates the queries if needed. Search engine 150 accepts the translated queries, searches the pertinent databases 140, and produces the search results, e.g. a list of database documents with a link (URL) for each document, possibly with a brief description of each document or a sample of the document's contents.
MT engine 160 uses its databases 170 to perform translations. Databases 170 contain language model information 170R which includes computer dictionaries and computer representations of grammar, and also contains statistical information 170S derived from known translations.
Importantly, the server can store information on search requests to help improve future translations. Such information is shown as click-through logs 180. For example, suppose that many users submit an English language query which we will denote as “qEn”, and after obtaining the search results the users frequently select from the search results a given URL (Uniform Resource Locator), e.g. www.fedex.com, which is an English-language home page of a U.S. company. Suppose also that many other users, possibly Chinese-speakers, submit a Chinese language query qCn, obtain search results, and select the URL www.fedex.com/cn, which is the Chinese-language home page of the same company. Server 110 may conclude that the English language query qEn is an English translation of the Chinese language query qCn. See e.g. U.S. pre-grant patent publication no. 2010/0161642 (Chen et al.) published Jun. 24, 2010 for other uses of the click-through data to improve machine translations.
The click-through data 180 are processed by the server's data mining engine 190 to update the MT databases 170 with pertinent information. More particularly, data mining engine 190 finds correlations between the click-through data (e.g. between URLs such as www.fedex.com and www.fedex.com/cn, and data queries such as qEn and qCn) and updates the MT databases 170 with information needed to improve future translations, e.g. with an indication that qEn and qCn are translations of each other.
FIG. 2A is a flowchart of a process performed by server 110. At step 210, the server receives information on a user request from a computer 120. The request can be a search query or a request for a document from previously submitted search results. At step 220, the server logs the request in click-through database 180. At step 230, the server performs other processing as needed for the request.
FIG. 2B shows an example processing operation 230 performed when the user request is a search query. At step 234, the server checks whether the query needs translation. In particular, the server determines in what language the query is; this is done by the server looking up the words of the query in dictionaries stored in databases 170. The server then determines in what languages the search should be performed. If the search should be performed in a language other than the query language, the query is provided to MT engine 160, which translates the query at step 238.
The translated query, or the original query if no translation is needed, is provided to the search engine 150. The search engine performs the search at step 242, and provides the search results. At step 246, the server sends the search results back to the computer 120.