1. Field of the Invention
The present invention relates to a computer program, apparatus, and method for searching a translation memory database to find entries matching a given source text and displaying corresponding target-language texts that are previously translated. More particularly, the present invention relates to such a computer program, apparatus, and method that compare texts not only on a sentence basis, but also on a smaller segment basis.
2. Description of the Related Art
Translators working in industrial fields are required to produce translations at a high throughput while ensuring their quality. Translation memory systems are used to aid human translators to build a database of previously translated texts and reuse them as a reference in a new translation project. With an input text entered for translation, the translation memory system searches its database to retrieve entries containing a source text similar to the given input text and then displays target-language translations in the retrieved database entries.
When the given input text is an entirely new text, it is unlikely that the translation memory system gives an exact match for that text. Some existing systems therefore divide an input text into segments when searching the database, in the hope of finding a good partial match at a phrase level (see, for example, Japanese Patent Application Publication No. 2006-134062).
A combination of a source-language text and its equivalent target-language text is called a “parallel text.” Suppose that a parallel text is retrieved from the translation memory database and displayed on a computer screen. The user then scrutinizes the parallel text visually to determine by him/herself which part of the text can be used for his/her translation work. This task is often burdensome, particularly in the case where the parallel text shown on the screen is long in length. In view of this problem, some existing systems aid the user to find an appropriate part of translations being displayed. One of such existing systems guesses source-target word pairs by analyzing retrieved parallel texts together and presents the result with emphasis on the word pairs that are found.
More specifically, the system searches a database of parallel texts and displays the entries matching an input text, giving emphasis on the matched portions, thus reducing the user's burden of seeking which part to reuse. It may also be possible to highlight the word pairs in multiple parallel texts retrieved from the database (see, for example, Japanese Patent Application Publication No. 2003-330924).
In the field of monolingual text search, an index system called “Key Word In Context” (KWIC) has been used to search a document for a specific keyword. The KWIC index enables a match to be extracted together with its surrounding context. The user can thus view not only the exact portion of that match, but also its surrounding text, on a search result screen.
While the user may be able to recognize matched words easier, the above-mentioned translation memory system still outputs long sentences as they are. Since long sentences occupy a large area on the monitor screen, the conventional system is unable to display many candidate texts on a single screen. Even if the system has successfully found a particularly good match, the user would have difficulty in locating that text on the search result screen because of the presence of many other matches which may not be useful for him/her. In such a case, the conventional translation memory system does not help much in improving the efficiency of translation work.
Some other systems have a database of parallel texts in the form of more smaller, bilingual text segments, so that a translation memory search will focus on a necessary portion. The search result is narrow in the first place and only requires a small screen space, but this means in turn that the user is unable to see the entire context of the retrieved translation.