The importance of search engine technology has grown significantly in the last decade or so, mirroring the expansion and usage of the Internet. When a user clicks a search button, a search engine hunts through tens of millions of terms to find terms and corresponding documents that satisfy the query. But, this superficial simplicity obscures the complexity of the underlying search technology, because good search engines do not generally stop with a simple matching of query terms.
To appreciate the complexity, consider that search engines fall generally into one of two categories: monolingual or multilingual. Monolingual search engines receive queries or search requests in one language, and retrieve documents in the same language. For example, Spanish language queries yield Spanish language documents. Monolingual search engines typically process a query by breaking, or parsing, it into individual terms, and then reducing or “stemming” each individual term to its root or base form. The stemmed terms, sometimes in combination with equivalent terms, are then used to find relevant documents. Thus, for example, a search for documents containing the word ‘cat’ also retrieves documents that include the term cats, cat's, cats', or even feline.
Multilingual searches engines, on the other hand, receive search requests in one language, such as German, and retrieve relevant information in another language, such as French or English. In such cases, the challenge of effective searching is more complex, because in non-English languages, nouns can be masculine, feminine, or neutral; verbs change form to show number (singularity or plurality), to show tense (present, past, future and so forth), and to show person—first (“I”), second (“you”), and third (“he/she/it.”); adjectives change form based on the nouns they modify; and character punctuation, such as accent or other diacritical marks, significantly affect meaning. While stemming resolves these complexities in a monolingual search, stemming alone cannot address the added complexities of linguistic conflicts across languages, and in some cases, may even interfere. For example, gender in most languages can be normalized to a single stem without loss of significant meaning; however, there are some languages, such as Portuguese, that require gender to be retained in order to maintain meaning. As a result, multilingual search engines typically rely on some method of translating queries and possibly documents into a common language.
Although there is continuing research in this area, the present inventors have recognized a need for alternative methods, systems, and interfaces for facilitating multi-lingual searches.