Huge quantities of information are available via the World Wide Web. For example, electronic commerce Web sites can offer for sale hundreds of products. Educational Web sites can offer access to the equivalent of thousands of printed volumes of information. To use these huge quantities of information, many Web sites provide search engines.
When accessing certain types of information, for example, pharmaceuticals on an electronic commerce Web site, the name or title of the specific information or product sought might be difficult to spell correctly. Misspelling of the name or title can result in poor search results. Returning to the pharmaceutical example, “acetaminophen” can be misspelled as “acitaminofen,” or “aceitamenofen,” or “ecytamenophin,” or other misspellings can result.
To improve search results based on misspelled search requests, phonetic searching based on canonical representations of the search request has been used. The common phonetic algorithm is called Soundex, which assumes that all spelling confusion is caused by vowels. Accordingly, Soundex takes any given word and removes all vowels to produce a single canonical form. For example, the search engine would convert “Neutrogena” to “NTRGN,” so that if a user typed “Neutrogina,” the search engine searches the index for NTRGN and thus would find “Neutrogena.”
Another phonetic algorithm is Metaphone, which improves on Soundex by taking into account the phonetic impact of combinations of letters when phoneticizing words. Like Soundex, Metaphone takes a given word and ignores vowels to produce a single canonical representation. However, before ignoring vowels, Metaphone extracts their phonetic meanings. Another feature of Metaphone is its treatment of related groups of letters known as diphthongs. Specifically, Metaphone generates a canonical spelling of a word by encoding any diphthongs, thereby replacing them with their phonetic representatives.
Though Metaphone improves on Soundex, Metaphone relies on there being a single canonical representation of a word. However, this works work well only for simple, well-known words with widely accepted pronunciations, where the user knows the pronunciation or is reasonably sure of the spelling. What is needed is an improved search engine for less familiar words, such as those used in the pharmaceutical industry.