With the growing reach of networks, network information resources have become increasingly abundant, and user demand for network information resources has steadily increased. However, while network information resources have been increasing, an obstacle to the widespread sharing of these network information resources by users exists relating to a plurality of languages. Research on multilingual information retrieval (MLIR) has begun to address the obstacle.
In a conventional example using Spanish and English, initially, the full texts of English documents are translated into Spanish documents. Indices corresponding to Spanish are established from the translated Spanish documents together with original Spanish documents. At the same time, the full texts of Spanish documents are translated into English documents. Then indices corresponding to English are established from the translated English documents together with original English documents. When there are English language query words, the English query words are searched in the indices corresponding to English, and the search results are obtained and returned. When there are Spanish language query words, the Spanish query words are searched in the indices corresponding to Spanish, and the search results are obtained and returned.
Conventionally, the first step is, for any language A therein, to translate documents from other languages into language A and establish indices together. The result is a bloated system architecture and hardware equipment, which is difficult to maintain or to expand. Secondly, for any language A therein, after the full texts of documents in the other languages have been conventionally translated into documents in language A, these documents are searched for query words in language A. Because a great variation among the rules of the various languages exists, semantic information from the original documents is lost during translation. In addition, the more documents that are translated, the greater the possible loss of semantic information will be. Thus, searches conducted based on full-text translations inevitably lack sufficient precision.