1. Field of the Invention
The present invention pertains to computer-implemented methods and systems for entering Chinese and Japanese characters for documents and for Web and other data object, image, and symbolic object or concept searching.
2. Description of the Related Art
The basic problem associated with entering Japanese and Chinese character into a computer is simply that keyboards cannot be made sufficiently large so as to accommodate the several thousand characters one would find in a Japanese newspaper, let alone the approximately 40,000 characters needed for formal Chinese technical or governmental documents. The same issues also arise in other languages. In addition, the growth in the number of character-based (Chinese, Japanese, and some Korean) websites makes it extremely difficult to search them without adequate methods of entering such characters. Furthermore, even obtaining a character and searching for it does not mean that the specific instance of the character for which the search is being performed has been isolated. Other languages, such as Arabic or Hebrew, have characters or syntax and writing styles poorly adapted to manual entry by current methods. Suboptimal solutions exist, but none function well. Conventional search systems may return desired results, but they invariably return many times as many undesired results as they do desired results.
The Japanese favor speech entry, but problems, related to speech defects or impediments, accents, pronunciations, errors (one word or character substituted for another), dialects, second or subsequent language, remain. Most current speech recognition systems require a lengthy training period to enable the machine to accurately transcribe the user's speech. Moreover, it is often necessary to train the user to exercise proper diction to enable the machine to operate at an acceptable recognition level.
Turning first to the written form of the Japanese language, the oldest common method involves the user entering a keyboarded (Romaji, or phonetic transliteration of the Japanese characters using the Roman alphabet, or kana—the Japanese phonetic characters) phonetic representation of the desired character (or phrase, meaning a cluster of characters). All similarly pronounced characters will be shown on the screen and the user is then asked to pick from among the similarly pronounced characters presented on the display. This process is slow, tedious, and does not always yield the desired characters.
Recent solutions require the user to dictate spoken Japanese or Chinese, with a software/machine translation of the spoken word into characters. The user then must edit the result or repeat the speech or confirm the result. Current methods using speech, however, are believed to lack adequate speed, precision, and suitability for all potential users. Likewise, conventional methods that rely solely on keyboard methods to enter Romaji pronunciations are believed to be slow, cumbersome and counter-intuitive. Finally, these conventional methods often interrupt the user's train of thought by requiring selection of one among many candidate characters or by requiring the user to repeat his or her speech. Often, the user may not recognize the candidate characters and is, therefore, unable to select the proper character from among the candidate characters. Moreover, these methods often fail to enable the user to reliably select the desired characters in a timely manner.
These shortcomings also manifest themselves when attempting to enter non-Roman-alphabet characters into a Web search engine for the purpose of searching Web sites containing such characters. What are also needed, therefore, are methods and systems that enable users to easily enter non-Roman-alphabet characters into a search engine and to search on the entered characters.