1. Field of the Invention
The invention relates to searching large amounts of information and analyzing the results of such a search. In one broad application of the invention, it relates to the area of web page searching either on the Internet or on Intranets. Furthermore, in the web context, the invention relates to improving the efficiency of analyzing search results and using the data gathered from efficient analysis to refine and improve the search process.
2. Description of the Related Art
Generally, the usefulness of any type of information is based upon a critical ability to find and adapt contextually relevant information in a timely manner. For example, if a cook is looking for a recipe, the existence of that recipe in an unidentified book of unknown whereabouts is not at all useful. Furthermore, even the book's identity and location would not be useful if it were not somehow readily accessible. Moreover, even if the cook were in possession of the correct book, without an index or table of contents, the process of finding and using the recipe would not be very efficient. Lastly, even an index and table of contents do not allow a cook to efficiently scan a large offering of recipes as compared to other techniques such as an index of pictures of the prepared foods.
From this illustration, one can easily see the importance of methods and systems, and the dimensions of information analysis that are required for efficient information location and retrieval. In fact, most everyone has learned how to use several simple systems such as those incorporated in libraries, dictionaries, maps and books. Few in our world, however, understand the methods and systems for finding information that is ultimately digitized or managed by machines such as computers. In the world of machine-managed information, there have been many propositions and techniques for solving these information location problems
Most commonly, the process of finding relevant information begins by reorganizing the entire universe of accessible information. For example, the phone company typically organizes phone numbers in the alphabetical order of the phone owners' names rather than organizing them in number order or by address. Of course, this allows users to find a number in the book knowing only someone's name. This same principle applies in the databasing of machine-managed information, where for example, a computer-user may create a database for contact information perhaps using a program such as Microsoft Access. After creating the database (the information repository), the computer user must populate it with data—this being the actual list of contacts. Each contact (generically called a record in database terminology) might include a name field, an address field, a phone number field and any number of other fields pertaining to personal contact information. Once the database is populated, a user can typically retrieve information based upon attributes of the data in one or more fields of the database. In summary, the data reorganizing (or pre-organizing) facilitates more easy retrieval of relevant information.
As databases and the records within them become larger, the reorganizing task can become larger and impede the ability to quickly and easily find relevant results. The problem is greatly increased when the exact form or nature of the records is inconsistent and not fully predictable. An example of this situation might be a document database wherein the records (documents plus attributes) are in variable forms (text, rtf, Microsoft Word, JPEG, TIFF etc.). In this type of database, a business manager might be looking for a certain report, but only recall two vague attributes about the report, possibly the month the document was created and the names of several people who might have created it. In this situation, the database will likely return a long list of documents every document created by one of the listed people during the specified month. Generally, the manager would then have very few options for further examining the long list. She could open each document and look at it or potentially look at the entire attribute list for each document. These options are unwieldy and time consuming and may not even ensure success.
An obviously large manifestation of this problem is in searching the world- wide-web or any web-like information collection (such as an intranet). Common search tools use various techniques to relate search terms or queries to web pages or web sites. The clear object is to find web pages that are most relevant to the search terms or query. However, given (i) the size and nature of the Internet and most intranets, and (ii) the skill level of most users, there is only a small likelihood of returning a single and perfect match for the search terms or query. Therefore, in order to increase the likelihood of retuning a perfect match, common search tools return an extremely long list of possible matches that are presented to the user in order of machine-determined relevance. This is very similar to the manager's document search problem discussed above. In the web context, the user is forced to click-through to successive documents on the list in order to determine the actual relevance to the search terms. This is clearly far less than ideal.
In order to improve this inefficiency, some products and services have returned an enhanced list, wherein each listing contains more information about the underlying record or document. Some examples of this information are (i) extra presumptively relevant textual information (ask.com, altavista.com, and yahoo.com); (ii) address information; (iii) revision information; or (iv) a small thumbnail image of the web page or document that a particular listing represents (capitalsearch.ca).