There is a vast amount of information in the world today that is available by computer. For example, on the World Wide Web alone there are millions of browsers and millions of web pages. In addition to the Internet, companies have set up local "intranets" for storing and accessing data for running their organizations. However, the sheer amount of available information is posing increasingly more difficult challenges to conventional approaches.
A major difficulty to overcome is that information relevant to a purpose of a user is often dispersed across the network at many sites. It is often time-consuming for a user to visit all these sites. One conventional approach is a search engine. A search engine is actually a set of programs accessible at a network site within a network, for example a local area network (LAN) at a company or the Internet and World Wide Web. One program, called a "robot" or "spider," pre-traverses a network in search of documents and builds large index files of keywords found in the documents.
A user of the search engine formulates a query comprising one or more keywords and submits the query to another program of the search engine. In response, the search engine inspects its own index files and displays a list of documents that match the search query, typically as hyperlinks. When a user activates one of the hyperlinks to see the information contained in the document, the user exits the site of the search engine and terminates the search process.
Search engines, however, have their drawbacks. For example, a search engine is oriented to discovering textual information only. In particular, they are not well-suited to indexing information contained in structured databases, e.g. relational databases. Moreover, mixing data from incompatible data sources is difficult in conventional search engines.
Often a user may wish to collect different kinds of information together. For example, a hospital administrative staff worker may need to search one database to find out what kind of health insurance a patient has, another database to find out which doctor is treating the patient, and a third database to find out which services have been performed. Often, the hospital administrative staff worker will be making the same kinds of time-consuming queries daily, but for different patients.
Another disadvantage with conventional search engines is that irrelevant information is aggregated with relevant information. For example, it is not uncommon for a search engine on the World Wide Web to locate hundreds of thousands of documents in response to a single query. Many of those documents are found because they coincidentally include the same keyword in the search query. Sifting through search results in the thousands, however, is a daunting task.
As another example, a personnel administrator might be interested an employee's choice of health plan, but an MIS administrator would be more interested in which computer the employee is using. Therefore, the user has to sort out which documents and databases are relevant and which are irrelevant for a particular goal.
By pre-traversing a network to index documents, a conventional search engine suffers from obsolescence of data in its search indexes. Documents are constantly being updated, but it may take months for the new information to filter down to search engines.
When a user activates a hyperlink on a page of search results, the user leaves the search site and terminates the search. Users who are browsing for more information must return back to the search site. Another effect of leaving the search site is that sponsors of the search site, e.g. paid advertisers, have minimal interaction with users of the search site.