1. Field of the Invention
This invention relates to information retrieval over a network, and, more specifically, to spell checking of network addresses used to retrieve information.
2. Description of Related Art
In order to access specific networked information such as World-Wide-Web (WWW) pages, users must often enter a network address such as a Universal Resource Locator (URL) which identifies the location of the page on a remote server. However, as network browsers evolved, the focus of the user interface has been to allow users to access remote pages by selecting hypertext links, thus often removing the need to manually enter URLs. Scant attention has been paid to the problems inherent in manual URL entry. Yet, the explosive growth of networked information systems such as the WWW has made it inconvenient to follow a long series of hypertext links to retrieve the page desired by the user. In fact, companies, organizations and individuals often provide their URLs in television advertisements, on printed materials, and verbally. This has led to a growing number of instances when the user needs to manually enter the URL into a browser for retrieval.
A major problem with the manual entry of URLs is the introduction of spelling errors, which are particularly common because of the characteristics of URL syntax and structure. Often long, the URL includes terms, such as "http", "com", "org", "gif", "jpeg", that are not commonly known by users. URLs may also be in a foreign language, especially for users in non-English speaking countries. Additionally, the URL may include odd special characters such as .about., , and @ that are difficult to type and hard to remember. The fact the URLs interpret upper and lower case letters differently is yet another source of user input error. Often the user is often relying on a quickly made note or just his memory from a brief appearance or spoken URL in an advertisement. Additionally, the URL may be misspelled in advertising, email, or even hypertext links and inadvertently point to other WWW pages. All of these factors taken together provide a rich basis for the introduction of spelling errors. A user who tries to follow a misspelled network address will not get the intended information even if the misspelling is a minor one. Since misspelling on the Web frequently occurs in peaks, there is a desire to enhance the efficiency of URL spell checking.
In order to assist the user with URL spelling errors a spelling checker is needed. Spell checking in general is well established in the art, with numerous implementation schemes. The central idea of a spell checker is to take the word in question and compare it to a dictionary of spellings known to be correct to find one or more words that are spelled roughly the same way and to then provide the user the ability to chose the correct word from a list presented by the spelling checking program.
Caching of items that are frequently referenced is known. Various algorithms, including hashing, allow for the quick retrieval of an item stored in a cache.
However, spell checkers of the prior art, are unsuitable for use in a network environment, such as the WWW environment, for several reasons. The dynamic nature of the WWW, where new URLs are constantly being created, precludes the use of a static dictionary. The sheer number of URLs precludes the use of a dynamic dictionary: as of April 1996 there were more than 30 million URLs on the WWW. Additionally, since the WWW operates in a client-server environment, only the server knows what URLs are valid for accessing WWW pages residing on that server. Finally, the prior art provides no mechanism for utilizing knowledge obtained from other user's behavior.
As an example of the state of the prior art, Netscape's Navigator WWW browser performs a simplistic spelling check on manually entered URLs. Specifically, the program tries to identify and correct problems with the protocol and the server's domain name. The program will try adding "http://" to the URL if no protocol is specified, it will also add "www." before and ".com" after the domain name if they are not present in the manually entered URL. These spelling check variations are helpful but not sufficiently robust or extensive to solve the general problem of spelling errors in manually entered URLs.