Information retrieval is often based, at least in part, on words in a language. To provide access to documents responsive to a search query, for example, search engines often identify documents containing some or all of the words of the search query. Effective information retrieval, however, often requires more than simply matching words from search queries to content in documents. If a user, for example, includes the commonly misspelled term “highschool” in a search query, simply matching words of the query to words in documents may cause documents lacking “highschool” but containing “high school” to be overlooked, even though the documents may be relevant to the intent of the user. Similarly, processing a query containing “high school” may overlook documents lacking “high school” but containing “highschool,” even if the documents with the misspelling may be relevant to the query. Indeed, many words and combinations of words are commonly misspelled, have multiple legitimate spellings, or otherwise may introduce complexity into systems that retrieve information based at least in part on the words.
Generally, intricacies of various languages can make effective information retrieval a difficult result to achieve. Many languages, such as German, contain many compound words. A user searching for “Damenschuhe” (women's shoes), a combination of “Dame” (lady) and “Schuh” (shoe) may intend to locate women's shoes to purchase from an electronic marketplace. A user searching for “Schuhe,” however, may expect to find women's shoes in search results. Other languages may not use spaces or other delimiting characters to separate words. The Japanese word for portable phone, for instance, is  Users searching for  (portable, common usage for a cell phone), however, may expect to locate items labeled as  One conventional way of handling the above and other issues is to consider components of words (or, generally, character strings) that are also words. Simply breaking up character strings into word components, however, may be ineffective in many cases, sometimes causing search results to be returned that users consider irrelevant or even erroneous. For instance, even though the word “wing” appears in the term “homebrewing,” the word “wing” may not be relevant to information sought by a user that has entered “homebrewing” in a search query.