1. Field of the Invention
The present invention relates to a machine translation method for translating or converting original text in a first language (foreign language; e.g., English) to translated text in a second language (native language; e.g., Japanese), and in particular to a machine translation method whereby a computer system performs a translation process using an electronically stored dictionary. More specifically, the present invention pertains to a method for compressing entry word index data in a dictionary; to entry word indexes for a dictionary that have been compressed; and to a method for searching for a word based on an entry word index that has been compressed.
2. Background Art
Over the years, much time and effort has been expended in the study of so-called xe2x80x9cmachine translationxe2x80x9d (or xe2x80x9cautomatic translationxe2x80x9d), a technique involving the use of the hardware resources of a computer system to translate text in one language to text in a second language.
Not long after the end of the Second World War, for example, following the development in 1946 of ENIAC, the first general purpose computer, many researchers became greatly interested in the possibility of using computers for xe2x80x9cmachine translation.xe2x80x9d And over the next ten years universities and research institutes invested an enormous amount of time, energy and money in its study; but with generally unsatisfactory results.
Thereafter, interest in machine translation waned somewhat. But today, impelled by recent developments related to the use of the Internet, the focus is once again on machine translation; once again great interest is being shown in developing and producing software for this purpose. This has come about because many users of the Internet, outside the English speaking community of nations, either cannot read English or read it imperfectly, and since most text on Web pages are written in English, these users can not fully utilize the new global information system, the WWW (the World Wide Web). As a result, translation software that once, when first developed, was priced in the tens of millions of yen can now be purchased for tens of thousands of yen, and since such software is therefore more easily acquired by users, it is now widely used on personal computers. Of the machine translation software products that are presently available, some are specifically intended for the translation of text on the Internet, i.e., the translation of Web pages. One example of such a product is the xe2x80x9cKing of Translation,xe2x80x9d which is sold by IBM Japan Co., Ltd.
In short, machine translation is a technique by which the processing capability of a computer system is applied for the translation of text written in a foreign language, such as English, into text in a native language, such as Japanese (or vice versa). For machine translation, a database is constructed by employing, as a model, the enormous amount of language knowledge that a human possesses (or is assumed to possess), and a translation engine, a type of data processor, is employed to refer to this database and to perform the actual translation.
An example database for a machine translation system is a dictionary. Recent machine translation systems prepare a dedicated dictionary for each genre, such as an art dictionary and a sports dictionary, in addition to a system dictionary that serves as a basic dictionary. Machine translation systems use dictionaries in accordance with the genre to which an object to be translated belongs, and thus the accuracy of a translation can be improved (see the specification in Japanese Unexamined Patent Application Hei 8-272755. and corresponding U.S. Pat. No. 6,119,078 issued on Sep. 12, 2000. Generally, a single machine translation dictionary is constituted by an entry word index portion and a main portion that describes translation data for entry words (includes xe2x80x9cmorpheme analysis data). The translation engine searches through the entry word indexes to acquire corresponding translation data.
For distribution, a machine translation system, i.e., the machine translation software, is generally recorded on a storage medium, such as a CD (compact disk) or a FD (floppy disk). To activate the machine translation software, an end user inserts into a drive unit of his or her computer system a CD or FD he or she purchased and installs a program recorded thereon in the computer.
An entry word index portion of a machine translation system is generally not stored in text form; usually it is compressed or encoded before being stored. This is done because an index portion that has an easily readable form may be employed or examined by a third party, especially by a competitor, and because the size of compressed entry word index data is reduced and can thus be held resident in memory. This last is important because the entry word index data must be accessed each time a word search is conducted, and when the entry word index data is resident in memory, the speed of a search is greatly increased. In particular, the sizes of entry word indexes for a machine translation system that prepares some dictionaries must be reduced, so that all of them can be held resident in memory. Conventionally, a common compression algorithm, such as xe2x80x9cLHAxe2x80x9d for the general-purpose personal computer (PC) or a compression command xe2x80x9ccompressxe2x80x9d for UNIX, is employed to compress entry word index data, or only an encoding process is performed for the index data without being compressed. However, these conventional techniques have the following shortcomings.
First, time is required for compression and recovery processing. In particular, once entry word index data are compressed, a search for data can not be performed, and thus, two steps are required: the decompression of the entry word index data and a search for the resultant data. As a result, the search efficiency is deteriorated.
In addition, since the individual entry words are short character strings (20 to 30 bytes at most), the compression rate is not good.
Further, only the simple encoding of data does not reduce data size.
It is, therefore, one object of the present invention to provide a method for compressing entry word index data for a dictionary to be used for machine translation, compressed entry word indexes for a dictionary, and a method for searching for a word using the compressed entry word index data.
It is another object of the present invention to provide a compression method that enables a search for compressed data to be performed without a decompression process being required, entry word indexes for a dictionary to be generated by such a compression method, and a method for searching for a word using the compressed entry word index.
To achieve the above objects, according to a first aspect of the present invention, a compression method comprises the steps of: (a) extracting character strings, constituted by n (n is an integer greater than 1) or more characters that frequently appear in an object to be compressed, which consists of many words; (b) calculating compression contribution values for the individual extracted character strings; (c) assigning highly ranked character strings having a high compression contribution value to empty columns in a character translation code table; and (d) substituting for a corresponding character translation code the character strings that are registered in the character translation code table.
According to the compression method in the first aspect of the present invention, the object to be compressed may be the entry word index data in a dictionary used for machine translation.
At step (b), for calculating the compression contribution value, the compression contribution value may be represented by (nxe2x88x92k)xc3x97count, which is a product of (nxe2x88x92k), a compression value obtained by replacing a character string S having n characters with a character string having k characters (n greater than k), and count, the frequency at which the character string S of the object to be compressed appears.
The character translation code table may be an ASCII (American Standard Code for Information Interchange) code table that conforms to the specifications prescribed by ANSI (American National Standards Institute).
According to a second aspect of the present invention, a method for compressing entry word index data for a dictionary used in a machine translation system, comprises the steps of: (a) extracting character strings constituted by n (n is an integer greater than 1) or more characters that frequently appear in the entry word index data; (b) calculating compression contribution values for the individual extracted character strings; (c) assigning highly ranked character strings having a high compression contribution value to empty columns in a character translation code table; and (d) substituting for a corresponding character translation code the character strings, in the entry word index data, that are registered in the character translation code table.
According to the compression method in the second aspect, at step (b), for calculating the compression contribution value, the compression contribution value may be represented by (nxe2x88x92k)xc3x97count, which is a product of (nxe2x88x92k), a compression value obtained by replacing a character string S having n characters with a character string having k characters (n greater than k), and count, the frequency at which the character string S in the entry word index data appears.
The character translation code table may be an ASCII (American Standard Code for Information Interchange) code table that conforms to the specification prescribed by ANSI (American National Standards Institute).
According to a third aspect of the present invention, a machine translation system for employing the processing capabilities of a computer system to translate text in a first language into text in a second language, comprises: a dictionary, including entry word index data compressed using the compression method according to the second aspect, and a main body in which are described translation data concerning entry words; and a translation engine for referring to the dictionary when translating text in the first language into text in the second language.
In the machine translation system according to the third aspect of the present invention, when the translation engine searches through the entry word index for a word included in text in the first language, the translation engine may, first, replace a character string included in a word registered in a character translation code table with a corresponding character translation code, and then perform search of the entry word index.
According to a fourth aspect of the present invention, provided is a computer-readable storage medium for physically storing a machine translation program that is operated by a computer system, which includes a processor for performing a software program, a memory for temporarily storing program code and data being progressed, an external storage device, input devices used by a user to enter data and a display for displaying processed data, the machine translation program comprising: (a) an entry word index data module compressed using the compression method according to the second aspect; (b) a dictionary main body module in which are described translation data concerning individual entry words; and (c) a translation engine module for referring to the dictionary constituted by the modules (a) and (b) to translate text in a first language into text in a second language.
In the computer-readable storage medium according to the fourth aspect of the present invention, when the translation engine module searches the entry word index for a word included in the text in the first language, the translation engine module may, first, replace a character string in the word, which is registered in a character translation code table, with a corresponding character translation code, and then perform search of the entry word index.
According to a fifth aspect of the present invention, a method for compressing entry word index data, for a dictionary used in a machine translation system, comprises the steps of: (a) translating original entry word index data into first entry word index data in which individual entry word character strings are represented by a difference from an entry word character string immediately above; (b) selecting, at step (a), an entry word I character string that has a large difference from an entry word character string immediately above, as a reference entry word character string to be described, unchanged, into the first entry word index data; (c) extracting character strings constituted by n (n is an integer greater than 1) or more characters that frequently appear in the first entry word index data; (d) calculating compression contribution values for the individual extracted character strings; (e) assigning highly ranked character strings having a high compression contribution value to empty columns in a character translation code table; and (f) replacing, with corresponding character translation codes, character strings in the first entry word index data that are registered in the character translation code table and generating second entry word index data.
According to the compression method in the fifth aspect, at step (d), for calculating the compression contribution value, the compression contribution value may be represented by (nxe2x88x92k)xc3x97count, which is a product of (nxe2x88x92k), a compression value obtained by replacing a character string s having n characters with a character string having k characters (n greater than k), and count, a frequency at which the character string S in the entry word index data appears.
The character translation code table may be an ASCII (American Standard Code for Information Interchange) code table that conforms to the specifications prescribed by ANSI (American National Standards Institute).
According to a sixth aspect of the present invention, a machine translation system, for employing the processing capability of a computer system to translate text in a first language into text in a second language, comprises: a dictionary including the second entry word index data compressed by the compression method according to the fifth aspect and a main body in which translation data concerning entry words are described; and a translation engine for referring to the dictionary to translate the text in the first language into the text in the second language.
In the machine translation system according to the sixth aspect of the present invention, when the translation engine conducts search of the entry word index for a word included in the text in the first language, the translation engine may, first, recover the original entry word character strings from character strings in the second entry word index data in accordance with the character translation code table, and compare the word with the recovered entry word character string.
According to a seventh aspect of the present invention, provided is a computer-readable storage medium for physically storing a machine translation program that is operated by a computer system, which includes a processor for performing a software program, a memory for temporarily storing program code and data being progressed, an external storage device, entry means used by a user to enter data and a display for displaying processed data, the machine translation program comprising: (a) a second entry word index data module compressed using the compression method according to the firth aspect; (b) a dictionary main body module in which translation data concerning individual entry words are described; and (c) a translation engine module for referring to the dictionary constituted by the modules (a) and (b) to translate text in a first language into text in a second language.
In the computer-readable storage medium according to the seventh aspect of the present invention, when the translation engine module performs search of the entry word index for a word included in the text in the first language, the translation engine module may, first, recover the original entry word character strings from character strings in the second entry word index data in accordance with the character translation code table, and compare the word with the recovered entry word character string.
In the natural language processing field, the statistical characteristics of languages have been pointed out as the basic characteristics, and have been studied and researched. One of the statistical characteristics of language that has been focused on most is the frequency at which a character appears. Especially since a number of Indo-European languages have alphabets of only 26 characters, the use frequencies of the individual letters in the alphabets have been examined in detail.
To represent the feature of an English character string, not only the frequency at which a single character appears has been studied, but also the frequencies at which combinations of two or three characters appear have been examined. These combinations are called 2-gram or 3-gram, but generally xe2x80x9cIn-gram strings.xe2x80x9d The order of the frequencies is affected by the type of text used to derive the statistics. In 2-gram statistics, character strings th, he, in, an, er, re and on frequently appear; in 3-gram statistics, character strings that seem to be part of a spelling of a word are extracted; and in n-gram statistics, character strings that appear frequently and conform to the English characteristics are extracted.
The compression method of the present invention employs the statistical characteristics of language. More specifically, the n-gram statistical analysis is employed to acquire frequently appearing character strings of n characters or more, and individual character strings having n characters or more are replaced by character strings having fewer than n characters, (e.g., character translation codes of 1 byte each). The correlation between the original character strings having n characters and the character translation codes is registered in the correlation table i.e., a character translation code table.
Assume that a character string of three characters, i.e., a character string of three bytes, xe2x80x9csta,xe2x80x9d is registered as 1-byte code xe2x80x9ce5xe2x80x9d and that a character string of four characters, i.e., a character string of four bytes, xe2x80x9ction,xe2x80x9d is registered as 1-byte code xe2x80x9cf1.xe2x80x9d Then, the word xe2x80x9cstation,xe2x80x9d which consists of a character string of seven characters, i.e., seven bytes, is represented by the 2-byte code xe2x80x9ce5 f1,xe2x80x9d so that this contributes to a compression of five bytes. When a character string xe2x80x9ce5 f1xe2x80x9d is found in compressed text data, columns for xe2x80x9ce5xe2x80x9d and xe2x80x9cf1xe2x80x9d in the code table prepared in advance are referred to, and the character string can be easily translated to the original character string xe2x80x9cstation.xe2x80x9d That is, the original word can be searched for without decompressing the compressed text.
According to the first aspect of the present invention, character strings constituted by n (n is an integer greater than 1) or more characters are extracted that frequently appear in an object to be compressed that consists of many words, and a compression contribution value is calculated for the individual extracted character strings. The compression contribution value is represented by (nxe2x88x92k)xc3x97count, which is a product of (nxe2x88x92k), the compression value obtained by replacing a character string S having n bytes with a character string having k bytes, and count, a frequency at which the character string S of the object to be compressed appears.
Then, highly ranked character strings having a higher compression contribution value are assigned to empty columns in a predetermined character translation code table. Assuming that as a result of the n-gram statistics, the compression contribution values of character strings xe2x80x9cstaxe2x80x9d and xe2x80x9ctionxe2x80x9d are high and that the columns xe2x80x9ce5xe2x80x9d and xe2x80x9cf1xe2x80x9d in the table are unused, the character strings xe2x80x9cstaxe2x80x9d and xe2x80x9ctionxe2x80x9d are registered in the respective columns.
The character strings to be compressed that are registered in the character translation code table are replaced by the corresponding character translation codes. For example, a character string xe2x80x9cstationxe2x80x9d of seven characters is compressed to a character code of xe2x80x9ce5 f1xe2x80x9d in accordance with the character translation code table.
The compression method according to the second aspect of the present invention is the one where the compression method of the first aspect is applied for the compression of entry word index data in a dictionary used for machine translation. According to the second aspect, first, character strings constituted by n (n is an integer greater than 1) or more characters that frequently appear are extracted from the entry word index data, and a compression contribution value is calculated for the individual extracted character strings. The compression contribution value is represented by (nxe2x88x92k)xc3x97count, which is a product of (nxe2x88x92k), a compression value obtained by replacing a character string S having n bytes with a character string having k bytes, and count, the frequency at which the character string S of the object to be compressed appears.
Then, highly ranked character strings having a higher compression contribution value are assigned to empty columns in a predetermined character translation code table. The character translation code table may be an ASCII (American Standard Code for Information Interchange) code table that conforms to the specifications prescribed by ANSI (American National Standards Institute). An ASCII code table is well known in this field as a table where alphanumeric characters are assigned for code. Assuming that as a result of the n-gram statistics, the compression contribution values of character strings xe2x80x9cstaxe2x80x9d and xe2x80x9ctionxe2x80x9d are high, the character strings xe2x80x9cstaxe2x80x9d and xe2x80x9ctionxe2x80x9d are assigned to the respective empty columns xe2x80x9ce5xe2x80x9d and xe2x80x9cf1xe2x80x9d in the ASCII code table.
The character strings in the entry word index data that are registered in the character translation code table are replaced with corresponding character translation codes. For example, an entry word xe2x80x9cstationxe2x80x9d in the entry word index data is compressed to a character code of xe2x80x9ce5 f1xe2x80x9d in accordance with a modified ASCII code table that is newly generated. In this case, a word xe2x80x9cstation,xe2x80x9d which consists of a character string of seven characters, i.e., seven bytes, is represented by the 2-byte code xe2x80x9ce5 f1,xe2x80x9d so that this contributes to a compression of five bytes. This compression process is performed for all the entry word index data. It should be noted that as a result, a great amount of entry word index data can be compressed. Thus the compressed entry word index data can remain resident in a main memory having a limited storage capacity without being withdrawn (swapped out).
The third aspect of the present invention is a machine translation system that employs entry word index data compressed in the second aspect. The machine translation system, for employing the processing capability of a computer system to translate text in a first language into text in a second language, comprises: a dictionary including entry word index data compressed by the compression method according to the second aspect and a main body in which translation data concerning entry words are described; and a translation engine for referring to the dictionary to translate the text in the first language into the text in the second language.
In the machine translation system according to the third aspect of the present invention, when the translation engine conducts search of the entry word index data for a word included in the text in the first language, the translation engine, first, replaces a character strings in the word, which are registered in a character translation code table (the modified ASCII code table generated by the compression method in the second aspect) with corresponding character translation code, and conducts search of the entry word index. When, for example, a word xe2x80x9cstationxe2x80x9d is found in an English document, which is the text in the first language, the word is translated into character codes xe2x80x9ce5 f1xe2x80x9d in accordance with the ASCII code table (assuming that the character codes xe2x80x9ce5xe2x80x9d and xe2x80x9cf1xe2x80x9d are assigned to xe2x80x9cstaxe2x80x9d and xe2x80x9ctionxe2x80x9d). Then, search is conducted of the entry word index data for the character code xe2x80x9ce5 f1,xe2x80x9d and translation data corresponding to the original character string xe2x80x9cstationxe2x80x9d are acquired.
In the compressed entry word index data, the character string xe2x80x9cstationxe2x80x9d of seven characters, i.e., seven bytes, is compressed into the 2-byte code xe2x80x9ce5 f1.xe2x80x9d To search for the word xe2x80x9cstationxe2x80x9d in the entry word index data, the word need only be translated into the corresponding character code xe2x80x9ce5 f1,xe2x80x9d and the entry word index data do not have to be decompressed. That is, since to examine the index data the decompression process of the compressed entry word index data is not required, a reduction in the search speed does not occur.
The compression method of the fifth aspect is an example where the compression method of the first aspect, as well as the second aspect, is employed for the compression of entry word index data in a dictionary used for machine translation. The compression method in the fifth aspect differs from the method in the second aspect in that, before entry word index data are compressed according the n-gram statistics, the differences between closely related entry word character strings is obtained to further increase the compression rate.
According to the compression method of the fifth aspect, first, original entry word index data are translated into first entry word index data in which individual entry word character strings are represented by a difference from an entry word character string immediately above. A character string for which a large difference exists with an immediately preceding entry word character string is maintained, unchanged, as the reference entry word character string in the first entry word index. When xe2x80x9cabatable,xe2x80x9d xe2x80x9cabatexe2x80x9d and xe2x80x9cabatementxe2x80x9d are arranged in ascending order in the original entry word index, entry word xe2x80x9cabatexe2x80x9d is substituted into character count 4, which is a count matching the immediately preceding entry word xe2x80x9cabatable,xe2x80x9d and xe2x80x9ce,xe2x80x9d which is a difference with the word xe2x80x9cabatable.xe2x80x9d An entry word xe2x80x9cabatementxe2x80x9d is substituted into character count 5, which is a count matching the immediately preceding entry word xe2x80x9cabate,xe2x80x9d and xe2x80x9cment,xe2x80x9d which is a difference with the word xe2x80x9cabate.xe2x80x9d These replacements are written into the first entry word index. Further, when the matching character count of the entry word xe2x80x9cabatablexe2x80x9d is extremely low relative to the immediately preceding entry word, that entry word is defined as the reference character string, so that the original entry word character string remains unchanged in the first entry word index and the matching character count is reset to 0.
Following this, the n-gram statistics is conducted for the character string difference in the first entry word index. The character strings constituted by n (n is an integer greater than 1) or more characters that frequently appear are extracted, and a compression contribution value is calculated for the individual extracted character strings. The compression contribution value is represented by (nxe2x88x92k)xc3x97count, which is a product of (nxe2x88x92k), a compression value obtained by replacing a character string S having n bytes with a character string having k bytes, and count, a frequency at which the character string S in the entry word index data appears.
Then, highly ranked character strings having a higher compression contribution value are assigned to empty columns in a predetermined character translation code table. The character translation code table may be an ASCII (American Standard Code for Information Interchange) code table that conforms to the specifications prescribed by ANSI (American National Standards Institute). An ASCII code table is well known in this field as a table in which alphanumeric characters are assigned to code. Assuming that as a result of the n-gram statistical analysis the compression contribution values of character strings xe2x80x9cablexe2x80x9d and xe2x80x9clityxe2x80x9d are high, the character strings xe2x80x9cablexe2x80x9d and xe2x80x9clityxe2x80x9d are assigned to the respective empty columns xe2x80x9c03xe2x80x9d and xe2x80x9cadxe2x80x9d in the ASCII code table.
The character strings in the first entry word index data that are registered in the character translation code table are replaced by corresponding character translation codes. For example, an entry word in the first entry word index, xe2x80x9c06 (matching character count) ion (character string difference)xe2x80x9d (original entry word is xe2x80x9cabjectionxe2x80x9d), is compressed to a character code of 1106 99xe2x80x9d in accordance with the newly generated ASCII code table. In this case, a word xe2x80x9cabjection,xe2x80x9d which consists of a character string of nine characters, i.e., nine bytes, is represented by the 2-byte code xe2x80x9c06 99,xe2x80x9d so that this contributes to a compression of seven bytes. This compression process is performed for all the entry word index data. Thus, the entry word index that has been substituted into the corresponding character translation code is the second entry word index, which is used for searching for a word in the dictionary during the machine translation processing.
According to the fifth aspect of the present invention, as is described above, an n-gram statistical analysis is conducted for a difference between the entry word character strings, and their compression contribution values are compared. As a result of the acquisition of the difference, the character string at the end portion of each entry word can be effectively extracted. For example, suffixes, such as xe2x80x9cion,xe2x80x9d xe2x80x9cnessxe2x80x9d and xe2x80x9cly,xe2x80x9d which are inherent to the English language and appear frequently, are extracted as character string differences. Therefore, compared with the compression method of the second aspect whereby an n-gram statistical analysis is conducted only for entry words, a long character string may be set in the high compression contribution ranks, and the compression rate can be further increased. The thus compressed entry word index data can be held resident in a main memory having a limited storage capacity without being withdrawn (swapped out). Especially for machine translation software program that prepares some dictionaries, compression of data and reduction in data size are effective means for holding the entry word index data resident in memory.
The sixth aspect of the present invention is a machine translation system that employs the entry word index data compressed in the fifth aspect. The machine translation system, for employing the processing capability of a computer system to translate text in a first language into text in a second language, comprises: a dictionary that includes the second entry word index data compressed by the compression method according to the fifth aspect, and a main body in which translation data concerning entry words are described; and a translation engine for referring to the dictionary to translate the text in the first language into the text in the second language.
In the machine translation system according to the sixth aspect of the present invention, when the translation engine performs search of the entry word index for a word included in the text in the first language, first, the translation engine recovers the original entry word character strings from character strings in the second entry word index data in accordance with the character translation code table, and then compares the word with the recovered entry word character string.
In the second entry word index, the reference entry word character string is maintained as the original entry word character string. Therefore, first, the reference character string that is most similar to a word is searched for in the second entry word index. When, for example, a word xe2x80x9cabjectionxe2x80x9d is found in an English document, which is the text in the first language, a reference character string xe2x80x9cabidancexe2x80x9d in the second entry word index is extracted as a candidate character string. If the word being searched for completely matches the candidate character string, the search of the dictionary is terminated. If the word does not match the candidate character string, an entry word that immediately succeeds the candidate character string is examined. If the immediately succeeding entry word is compressed, the original entry word character string must be recovered. If xe2x80x9c04 (matching character count) 65 (character string difference code)xe2x80x9d is an entry word that succeeds the reference entry word xe2x80x9cabidance,xe2x80x9d which is first extracted as a candidate character string, the first four characters xe2x80x9cabidxe2x80x9d are extracted form the immediately preceding entry word character string xe2x80x9cabidance,xe2x80x9d and a character xe2x80x9cell  greater than a assigned to column xe2x80x9c65xe2x80x9d in the ASCII code table. The character strings xe2x80x9cabidxe2x80x9d and xe2x80x9cexe2x80x9d are coupled to recover the original character string xe2x80x9cabide.xe2x80x9d When the candidate character string that is recovered matches the word being searched for, the dictionary search is terminated. If they do not match, the recovering and comparison process is repeated for a succeeding entry word in the index. As a result of repetition, the word xe2x80x9cabjectionxe2x80x9d is obtained from the entry word index, and corresponding translation data can be acquired.
In the second entry word index data, the character string xe2x80x9cabjectionxe2x80x9d of nine characters, i.e., nine bytes, is compressed into the 2-byte code xe2x80x9c06 99,xe2x80x9d which is held. The entire entry word index data do not have to be decompressed in order to search for the word xe2x80x9cabjectionxe2x80x9d from the entry word index data. That is, since the decompression process for the compressed entry word index data is not required for an examination of the index data, a reduction in the search speed does not occur.
According to a computer-readable storage medium of the fourth or the seventh aspect of the present invention, the structural or functional cooperative relationship between a computer program and a storage medium is defined in order to implement the function of a computer program in a computer system. That is, when the computer storage medium is loaded into a computer system (or a computer program is installed in a computer system), the cooperative operation can be demonstrated by the computer system. As a result, the same operating effect as in the machine translation system according to the third or the sixth aspect of the present invention can be obtained.