The evolution of computers and networking technologies from high-cost, low performance data processing system to low cost, high-performance communication, problem solving and entertainment systems has provided a cost-effective and time saving means to lessen the burden of performing every day tasks such as correspondence, bill paying, shopping, budgeting and information gathering. For example, a computer (e.g., a desktop, a laptop, a hand-held and a cell phone) interfaced to the Internet, via wire or wireless technology, can provide the user with a channel for nearly instantaneous data exchange (e.g., via email, newsgroups and ftp) and merchandise consumption, and access to a wealth of information from a repository of web sites and servers located around the world, at the user's fingertips.
Typically, the information available via the web sites and servers is accessed via a web browser executing on a web client (e.g., a computer). For example, a web user can deploy a web browser and access a web site by entering the web site Uniform Resource Locator (URL) (e.g., a web address and an Internet address) into an address bar of the web browser and pressing the enter key on a keyboard or clicking a “go” button with a mouse. The URL typically includes four pieces of information that facilitate access: a protocol (a language for computers to communicate with each other) that indicates a set of rules and standards for the exchange of information, a location to the web site, a name of an organization that maintains the web site, and a suffix (e.g., com, org, net, gov and edu) that identifies the type of organization. As an example, for the fictitious address http://www.foo.com, http specifies the web server uses Hypertext Transfer Protocol (HTTP), www specifies the web site is on the World Wide Web, foo specifies the web server is located at Foo Corporation, and corn specifies that Foo Corporation is a commercial institution.
Likewise, a user can deploy a web browser to access a server. For example, the URL for an FTP (File Transfer Protocol) server is similar to the URL associated with a typical web site. In general, the FTP server URL includes a protocol, a location, a name of the server maintainer, and an appropriate suffix. For example, the fictitious address ftp://ftp.foo.com can be employed, and signifies an FTP server for the Foo commercial institution, wherein communication is achieved via the FTP. In general, FTP is typically employed to make files and folders publicly available for transfer over the Internet. Typically, a password is employed to log on and gain access to the files and folders on the server, or computer. However, FTP often can be utilized to gain access to certain networks or servers without having an account or being an official password holder with that server or computer. For example, an “anonymous” FTP server can be setup, and can contain a broad range of data that can be publicly available through FTP, wherein a generic password such as “guest” or “anonymous, “or a null password can be employed.
In many instances, the user knows, a priori, the name of the site or server, and/or the URL to the site or sever that the user desires to access. In such situations, the user can access the site, as described above, via entering the URL in the address bar and connecting to the site. However, in other instances, the user does not know the URL or the site name. Instead, the user employs a search engine to facilitate locating a site(s) based on a keyword(s) provided by the user. In general, the search engine is an executable application(s) or program(s) that searches the content of web sites and severs for a keyword(s), and returns a list of links to web sites and severs where the keyword(s) were found. Basically, the search engine transmits a spider to retrieve as many documents as possible associated with the keyword(s). Then, an indexer reads the documents, and creates a prioritized index based on the words contained in each document. Respective search engines generally employ a proprietary algorithm to create indices such that meaningful results are returned for a query.
The large volume of information available via the Internet and the trend to associate a plethora of terms with a site and/or server (e.g., in order to increase the probability of being selected for inclusion in the list of links) commonly results in hundreds or thousands of links returned to the user, wherein many of the links provide access to sites and servers that are not useful to the user. Even a much smaller list of twenty links, for example, can lead a user down a time consuming path comprising non-significant links. In addition, many times several searches are performed utilizing various keywords in anticipation of retrieving a link(s) with a greater correlation to the information the user desires.
One technique employed by search engines to increase search success and reduce superfluous results is a means to search for a word, several words, and a phrase, and to employ Boolean operators. However, and as noted above, a site can be associated with many words and phrases to provide a broad and generic facade in order to increase the chance of being selected in an attempt to increase market exposure. Another technique utilized employs a string-based distributional analysis. Linguists have utilized string-based distributional analysis since the early 1900's to learn information about a lost language solely from a collection of text in that language. By way of example, assume the lost language is English. With absolutely no knowledge of the language, one could discover that the strings “dog” and “cat” are related because they are distributionally similar in text. In other words, it can be observed that the words that precede and follow, or appear in the vicinity of the string “dog” are very similar to the words that appear in the vicinity of the string “cat.” String-based distributional analysis provides a useful mechanism in the running text domain to facilitate determining similar text and improve searches; however, string-based distributional analysis is limited in that it merely employs words that precede and follow the term(s) of interest.