1. Field of the Invention
The present invention relates to a language identifying apparatus and a language identifying method for judging a language of a character string represented by a character code string and the type of its character code (a character code system), various apparatuses for identifying a language of a text (a sentence) or words or a word represented by fed text data or keyword (both are encoded) to switch various types of processing, and a storage medium storing a computer program for controlling the apparatuses or realizing the method.
2. Description of the Background Art
Character codes for kanji (or hangeul) currently used in Japan, China (the People's Republic of China), South Korea, and Taiwan (the Republic of China) represent one character by two bytes. The character codes (systems) are independently defined for each language (Japanese, Chinese, Korean, etc.). Characters in the same language are represented by different character codes if they differ in an encoding method (a character code system, the type or kind of code, or a rule for encoding). Information representing a language is not generally added to character code data. When a series of character codes is fed, therefore, it cannot be simply judged what language is encoded to obtain the character codes.
A language information processing system such as a database search system, a translation system, and a speech synthesis system is constructed on the basis of a particular language and its character code system. Let's consider a language information processing system which is available to a plurality of types of languages. Since language information processing differs depending on the type of language, languages represented by a fed keyword and text data must be found. If the language represented by the fed keyword or text data, and its character code system are not clear, suitable processing cannot be expected.