Field of the Invention
The present invention relates generally to information search and retrieval. More specifically, systems and methods are disclosed for performing searches using queries that are written in a character set or language that is different from the character set or language of at least some of the documents that are to be searched.
Description of Related Art
Most search engines operate under the assumption that the end user is entering search queries using something like a conventional keyboard, where the input of alphanumeric strings is not difficult. As small devices become more common, however, this assumption is not always valid. For example, users may query search engines using a wireless telephone that supports the WAP (Wireless Application Protocol) standard. Devices such as wireless telephones typically have a data input interface wherein a particular action by the user (e.g., pressing a key) may correspond to more than one alphanumeric character. A detailed description of WAP architecture is available at http://www1.wapforum.org/tech/documents/SPEC-WAPArch-19980439.pdf (“WAP 100 Wireless Application Protocol Architecture Specification”).
In the usual case, the WAP user navigates to the search query page, and is presented with a form into which they input their search query. With conventional methods, the user may be required to press multiple keys to select a particular letter. On a standard telephone keypad, for example, the user would select the letter “b” by pressing the “2” key twice, or would select the letter “s” by pressing the “7” key four times. Accordingly, to enter a query for “ben smith”, the user would ordinarily need to enter the following string of key presses: 223366077776444844, which map to letters as follows:
22→b
33→e
66→n
0→space
7777→s
6→m
444→i
8→t
44→h
After the user has entered their search request, the search engine receives the word or words from the user, and proceeds in much the same manner as if it had received the request from a desktop browser wherein the user employed a conventional keyboard.
As can be seen from the foregoing example, this form of data entry is inefficient in that it requires eighteen keystrokes to enter the nine alphanumeric characters (including the space) corresponding to “ben smith”.
Similar difficulties may arise when typing queries using non-target-language keyboards. For example, Japanese text can be expressed using a variety of different character sets, including hiragana, katakana, and kanji, none of which are easily entered using a typical ASCII keyboard based on the Roman alphabet. In such a situation, the user will often make use of a word-processor such as Ichitaro, produced by JustSystem Corp. of Tokushima City, Japan, that is able to convert text written in romaji (a phonetic, Roman-alphabet representation of Japanese) to katakana, hiragana, and kanji. Using the word processor, the user can type a query in romaji, and then cut-and-paste the translated text from the word processor's screen into a search box on the browser. A drawback of this approach is that it can be relatively slow and tedious, and requires the user to have access to a copy of the word processor, which may not be feasible due to cost and/or memory constraints.
There remains, therefore, a need for methods and apparatus for providing relevant search results in response to an ambiguous search query.