As various forms of media have increased, users now have to deal with information overload. Conducting a search on the Internet can often generate thousands of hits where each hit can be a multi-page document or presentation. Other media forms such as television or even presentations such as might be seen from a display can also overload one's senses with more information than can be processed at a given time. This may even require taking in useless information that would be better left unprocessed.
Nowhere is information gathering and processing more evident than the common employment of a search engine. Search engines are associated with a program that searches documents for specified keywords and returns a list of the documents where the keywords were found. Although the search engine is really a general class of programs, the term is often used to specifically describe systems that enable users to search for documents on the World Wide Web and other information newsgroups. As desktop computing platforms have become more sophisticated, search capabilities similar to those provided by the typical Web search engine have migrated on to the desktop platform as well. Thus, local databases associated with the desktop can be searched for information in a similar manner as larger search engines comb the Internet for information. Typically, a search engine operates by sending out a crawler to fetch as many documents as possible. Another program, called an indexer, then reads these documents and creates an index based on the words contained in each document. Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query.
Search engines are considered to be the key to finding specific information on the vast expanse of the World Wide Web and other information sources. Without sophisticated search engines, it would be virtually impossible to locate data on the Web without knowing a specific universal recourse locator (URL). When people use the term search engine in relation to the Web, they are usually referring to the actual search forms that search through databases of HTML documents, initially gathered by a robot. There are basically three types of search engines: Those that are powered by robots (called crawlers; ants or spiders) and those that are powered by human submissions; and those that are a hybrid of the two.
Crawler-based search engines are those that use automated software agents (called crawlers) that visit a Web site, read the information on the actual site, read the site's meta tags and also follow links that the site connects to performing indexing on all linked Web sites as well. The crawler returns all that information back to a central depository, where the data is indexed. The crawler will periodically return to the sites to check for any information that has changed. The frequency with which this happens is determined by the administrators of the search engine. Human-powered search engines rely on humans to submit information that is subsequently indexed and catalogued. Thus, only information that is submitted is put into the index.
One deficiency of present data gathering techniques relates to how data is collected, returned, and subsequently presented to the user for respective searching and data gathering resources. Most search results include the first few words of a document or the title of the document itself. Often times however, the first few words of a document or file are ambiguous, incomplete, or misleading as to the actual contents of the file. Moreover, users are often forced to select a document, scan though its contents, and then finally make a determination as to the usefulness of the data contained therein. As can be appreciated, this can take more time to determine whether a returned document has value to the user and often causes users to process information that is actually superfluous to the task at hand. Even in common desktop arrangements, users are often forced to scan through many files, observe the data contained in the files, and make a determination as to the usefulness of the files before searching other potential candidates they may be looking for.