This invention relates generally to the field of search techniques used on information management system or on the global information network (xe2x80x9cthe World Wide Webxe2x80x9d). More specifically, the present invention is a method and system for refining and improving search queries and for organizing the results of a search query by different and overlapping criteria.
The blossoming of the World Wide Web in the 1990s has given computer users access to vast quantities of information, an estimated 100-300 million Web pages, many terabytes of data. The user provides the Uniform Resource Locator (xe2x80x9cURLxe2x80x9d) of a page to the browser, the browser retrieves the page from the Internet and displays it to the user. When the user knows the URL of the page, the procedure is simple. However, to find information on the Web, the user must access a search engine. The user submits a query and the search engine returns a list of URL""s of pages that satisfy the query together with a summary of each page. The continuing exponential growth of the Web makes the task of finding the relevant information exceedingly difficult. This effort is further aggravated by the unorganized and extremely dynamic nature of the Web.
There are two paths to searching for information on the Web. One path is consulting a manually compiled Web catalog, such as Yahoo. Any manual catalog of the Web necessarily suffers two drawbacks: the nature of the information on the Web makes any cataloging efforts necessarily limited and incomplete, and the catalog offers no help to a user interested in a subject that happens not to be covered by the catalogers.
The other path to searching for information on the Web is using a Web engine. The major ones as of January 1998 are AltaVista, Excite, HotBot, InfoSeek, Lycos, NorthernLight, and Web Crawler, plus a number of branded versions of these. These engines send out programs called robots, or crawlers, which automatically peruse the Web and gather Web pages they discover. The collected pages are automatically indexed and collected into a data base. In this process, known as indexing, Internet URLs are associated with relevant words from the page they identify. Many search engines store page summaries along with URLs. Page summarization varies from one search engine to another. Some search engines store the first fifty words of a document. Other engines, try to understand the content of the pages. They attempt to define relevant xe2x80x9cideasxe2x80x9d based on associations of words within documents and they summarize the Web Pages by storing these xe2x80x9cideasxe2x80x9d. The users can query the indices for pages meeting certain criteria. For example, a user can request all the Web pages found by the search engine that have the phrase xe2x80x9ccryptography softwarexe2x80x9d somewhere in the text. There are two major problems with using the search engines: 1) incomplete coverage and 2) difficulty of effective use. Not a single engine contains a complete index of the Web; they index anywhere from 2 million pages by WebCrawler to 100 million pages by AltaVista. Given the explosive growth of the Web and the limitation of time and space faced by search engines, it is unlikely that full coverage of the Web is forthcoming.
Most users feel the incompleteness of the indices only indirectly, since they can not miss a web page if they do not know it exists. The more pressing problem is that using the search engines can be a frustrating, time-consuming, and often unsuccessful process for the user. In most search sessions, the user""s needs are well enough formulated in her head that only a small number of web pages would exactly meet her need. The problem then, is getting the search engine to understand the user""s needs. Unfortunately, the state of the art in human-machine interaction is far from meeting such a goal. Many user queries produce unsatisfactory results, yielding thousands of matching documents. The search engine indices support many basic information retrieval queries, but the users are offered little guidance in determining which keywords and in which combination would yield the desired content. Typically, the user ends up alternating between specifying too few keywords which yield too many matching documents, and supplying too many keywords which yield no matches. Many search engines lack efficiency in eliminating duplicate URLs from their indices. As a consequence, redundant information is sometimes returned to users, and can create a lot of frustration.
While a number of tools have been developed to help the user search more intelligently, by allowing selection of additional search criteria, none of them offers useful analysis of the query results that could give guidance to the user in reformulating a more appropriate query. Some search engines group and display results based on the popularity of the site. While others attempt to do some type of organization. One such search engine, Northern Light, organizes all the query results into at most 10 folders based on subject, type, source and language. While this is a step in the right direction, the user is not given any information on how the categories are derived or on how many results are in each folder.
The present invention is embodied in a simple and effective method for improving the searching of an information management system using a search engine and for refining and organizing the search results.
The present invention provides for a query tuner, allowing a user to effectively reformulate a query in order to find a reasonable number of matching documents from the search engine by automatically and selectively modifying individual query terms in the user""s query to be weaker or stronger.
One aspect of the present invention provides for a dynamic filter, using a dynamic set of record tokens to restrict the results of a search query to include only records which correspond to the record tokens.
Another aspect of the present invention provides for a results organizer, to aid the user in organizing and understanding a large number of matching documents returned in response to a search query by clustering like items returned from the search.
Another aspect of the present invention provides for a search history, to allow the user to save, organize and search the queries and the documents that best satisfy the query.
It is to be understood that both the foregoing general description and the following detailed description are exemplary, but are not restrictive, of the invention.