1. Field of the Invention
The present invention relates to a method and system for conducting searches. More particularly, the present invention relates to a method and system for conducting a search for information by selecting search terms from amongst all the terms in the search results themselves.
2. Background of the Related Art
A keyword search of the Internet or other electronic media, using well designed all-purpose search engines such as Google (http://www.google.com), AOL (http://www.aol.com), or Yahoo (http://www.yahoo.com), often return thousands or even millions of hits. In part, this is because users enter only general search terms when they are looking for information.
When users do not anticipate the breadth of the search terms they submit, the search results or “hits” may be unfocused. The hits often span more content than is wanted. Only some of the selected content will be on topic for a particular user at a particular time. For example, a search for “file folder” may return hits about how computers are organized and lists of office supply stores; a search for “bass” returns hits about fish mixed with hits about musical instruments and a particular chain of shoe stores. When users skim such lists of hits, they may become aware that their search was too broad. They know that their results cover several topics and that some of them are not of interest.
Several methods exist for reducing hit lists when hit lists are heterogeneous. The user can construct and enter a new search string that includes words that discriminate between what they want at that particular time and what they do not want. For example, “file folder” might be changed to “file folder computer” and “bass” might be changed to “fish bass.” Even though these phrases are not syntactical, both produce more focused search results.
Google puts a search-within-results option at the bottom of its pages. A user can reduce their hits by typing words that modify their original search. In addition, with most browsers, a user can highlight words that appear on a page of hits and drag them to the search box where they can be added to the original search string. For example, it might be efficient to add “muscular dystrophy” to the keyword, “heart,” by highlighting the phrase when it appears in a hit list and dragging it to the search box. This strategy is slow but it prevents typographical errors.
At this time, there are at least two technologies that help users reduce and focus hit lists without entering new information into the search box. Google provides a link to “Similar pages” at or near the end of the listings on a page of hits generated by the Google search engine. Clicking on the “Similar pages” hyperlink produces a list of 30-35 hits that have content that is similar to the one hit that has been identified as relevant by the user.
However, the user does not have a choice of what words are used to identify hits that are defined as “similar” when they click on Google's “Similar pages” hyperlink. They do not even know for certain what word or words are used by Google to define “similar” hits.
Another technology for helping users select content is to put a list of specific sub-topics, sometimes called clusters or facets, on the page of search results. For example, Yahoo often puts suggested sub-topics at the top of a hit list if the search words submitted by a user are very general. Also, Vivisimo, Inc. (http://www.vivisiomo.com), Endeca, Inc. (www.endeca.com), Siderean, Inc (www.siderean.com) and many other search engines display lists of specific sub-topics as a basic design feature of their search technology. Their sub-topics are developed to meet business and design goals in a variety of situations. For example, LexisNexis Academic (www.lexisnexis.com) recently announced a new user interface, available in the summer of 2007, that will cluster news, legal and business information by subject, industry and company (www.econtentmag.com).
Sub-topics are defined by the algorithms of a particular search engine and their display is under the control of the search engine. The user can select a specific topic that is consistent with the search they intended to make if one of those displayed topics expresses what they are searching for. Doing so will reduce the hit count and focus their search.
Lists of sub-topics, (clusters) as they are shown by sites such as Yahoo, Vivisimo, Endeca and Siderean also have disadvantages. Users may not find a choice that is helpful. There is a practical limit to the number of sub-topics that can reasonably be displayed on a first page. Designers often truncate lists or direct users to other pages by putting a “more” hyperlink at the bottom of a short list of popular sub-topics in order to accommodate as many sub-topics as possible. The user then must take the time to page back and forth to see all their choices. Even when additional pages are used, all possible sub-topics can not be listed if the hit count is large or if the search results are heterogeneous. One of the ways search engines control the numbers of sub-topics they display to the display only the most popular ones.
U.S. Pat. No. 5,278,980 to Pedersen, et al, describes a “phrase oriented” search technique to help reduce and focus the hits returned from a keyword search. The technology identifies one search term in each hit (called a non-stop word) that is immediately adjacent to the keyword used to produce the list of hits. For any of the hits, the user can either select the adjacent search term and reduce the hit list or execute a particular function key to add the next most adjacent search term in that particular hit to the display. Pedersen et al. also include a variety of rules that account for situations where a keyword search starts with multiple words. The process of either selecting or “extending” a display in order to refine a search can be repeated multiple times.
Pedersen et al. “disambiguate” the meaning of the keywords and to avoid distracting the reader by cluttering the display . However, the fragmentation of the display and the interactions required of the user are not as easy to understand or to use as other techniques for reducing hit lists.