The Internet contains a vast amount of information, distributed over a multitude of computers connected by “The Net”, hence providing users with large amounts of information on any topic imaginable. Although large amounts of information are available, however, finding the desired information is not always easy or fast.
Search engines have been developed to address the problem of finding desired information on the Internet. Typically a user who has an idea of the type of information desired, enters a search term or search terms and a search engine returns a list of web pages that contain the term or terms. Alternately, a user may want to browse through data, as for example, when a user is not sure what information is wanted. Some search engines, such as YAHOO or LOOKSMART, provide categories of information and categories within those categories for selection by a user, who can thus drill down to an area of interest from a more general category.
The term “search engine” is frequently used to describe both crawler-based search engines and engines based on human-edited directories. Crawler-based search engines generally work by indexing web pages automatically and usually contain a spider, an index and search-engine software. A search engine “spider” crawls through the web, following links to other pages within the site, and returns its results to an index or catalog. The index will contain a copy of every web page visited by the spider. Search engine software analyzes each page in the index to find matches to a search and ranks the pages in order of relevance.
Each search engine builds its index and ranks the web pages in the index in its own way, which explains why a user is likely to receive different search results for the same search conducted on different search engines. Typically, search engines rank “hits” based on a search-engine-specific algorithm involving the location and frequency of keywords on a web page. For example, pages with the search term(s) appearing in the HTML (hypertext markup language) title tag are often assumed to be more relevant to the topic than others. A search engine also may check to see if the search keywords appear near the top of a web page. These search engines operate on the premise that any page relevant to the topic will contain the keywords within the title or within the first few paragraphs of the web page.
Frequency is another consideration in the determination of relevance. A search engine may analyze how often a keyword appears in relation to other words in a web page. Pages containing keywords appearing with a higher frequency are often deemed more relevant than other web pages.
The search engine returns a list of pages in order of relevance, as the search engine determines relevance. Unfortunately, this kind of search engine often returns irrelevant results because web pages frequently contain words that don't really relate to the query. For example, if a user enters the search query “Andalusian Horses”, the search may return a web page concerning Vacations to the Andalusian Mountains in Spain.
The problem is compounded when a very general query term is entered or the query term is capable of multiple meanings, such as “Java” for example. Does the user want “Java” the programming language? “Java” as in “Where's my morning Java?” “Java” the island? A user looking for web sites concerning “Java” the programming language may have to page through a number of site listings about coffee and Java the island to find Java the programming language.
A second kind of search engine (e.g., YAHOO or LOOKSMART) matches terms in a query to a human-built directory of categorized web sites. A webmaster submits a short description of the site, or editors write a description for a site. When a user enters a search query, the search engine matches the terms in the query to the terms in the description and displays to the user only those web sites where a match is found between the word(s) in the query and the words in the description. Alternately, a human editor may review each site and manually assign the site to a particular category.
This type of search engine also has drawbacks. For example, in the case where a query is very broad, it may be hard to guess a category to attach the query to. Also, any web sites not editorially assigned to a category will be lost if the user picks the category. For example, imagine that a user enters the word “lincoln” as his query. Entering the word “lincoln” may result in the return of the following categories: “U.S. States>Nebraska”, “Recreation>Automotive>Makes and Models” and “Arts>Performing Arts>Centers” (“>” indicates that phrases to the right of the “>” are subcategories of phrases to the left of the “>”). But if a user picks the “U.S. States>Nebraska” category, only those sites the human editor has linked with that category will appear, even if there are other good sites that would appear if the user merely searched for the terms “Lincoln+Nebraska”. If the user misspells the query terms or uses a different word than is contained in the description, relevant web sites may not appear at all.
A third type of search engine (e.g., YAHOO or Google) is a hybrid search engine that presents both crawler-based results and human-powered directory-based listings. Typically, a hybrid search engine will favor one type of listing over another. For example, YAHOO is more likely to present human-powered listings.
Search engines typically are unable to provide a hierarchical relationship between data entities. For example, a search for “Ford” typically returns dozens of different FORD model links, overwhelming different interpretations of the query (e.g., Harrison Ford). There is no way in typical search engines to indicate that “FORD Trucks” is a superset of “FORD F-150”, “FORD Ranger”, etc.
Finally, for the same topic, the search results returned from a browse and those returned from a search typically can be significantly different. In order to get the best search results, it is often necessary to have both a browse window and a search window open concurrently, which is inconvenient and requires a certain degree of sophistication and search-engine savvy on the part of the user. Unsophisticated users, unaware that a search engine may have multiple types of data sources, may become mystified and frustrated with the results of a search or browse. Hence, a need exists in the art for a method to process a search that enables each searcher to get great search results faster and more conveniently, regardless of how much the searcher knows about the eccentricities of the search engine used.