The present invention is related generally to the field of database searching, and more specifically to simultaneous searching for data across a wide area network such as the Internet, the network including a plurality of clients and servers and a plurality of databases.
A wide area computer network, or WAN, comprises a geographically disperse, interconnected plurality of computers capable of sharing data and/or processing capacity. The Internet is the world""s largest WAN, growing at an annual rate some estimate to be above one thousand percent. In March of 1998, there were an estimated 320 million pages of information posted on the World Wide Web (the graphics-capable portion of the Internet), with uncounted millions of gigabytes of additional information stored in non-Web based, though Web accessible, databases. For the purpose of describing the present invention, information obtained through the Web, for example presented in Hyper Text Markup Language (HTML) and available at a consistent Uniform Resource Locator (URL) is within the xe2x80x9cvisiblexe2x80x9d web and is termed xe2x80x9cdirectly accessible.xe2x80x9d Conversely, information accessible only via access to a distinct portal or other electronic doorway (even if such a portal or doorway is found on the Web) is within the hidden or xe2x80x9cinvisiblexe2x80x9d web and is termed xe2x80x9cindirectly accessible.xe2x80x9d While there are numerous search engines and xe2x80x9cweb crawlersxe2x80x9d that may be used to search for directly available data on the visible web, there is presently no singular source for accessing the indirectly available information on the hidden web. The present invention addresses the need for an efficient method of finding data on a large scale WAN such as the Internet, including the visible and hidden portions of the World Wide Web, and the need to efficiently update found information as content evolves and grows.
A number of challenges face the computer user accessing the Web and attempting to locate information about any particular subject matter. First, the immensity of the visible Web makes sorting through data found through currently available search engines difficult and time consuming. Second, found data may include a substantial quantity of material not related to the sought-after material, but discovered anyway through simple boolean word association or other search mechanisms known to those skilled in the art to which the present invention pertains. Third, there is no available mechanism for conducting a single search that accesses a plurality of hidden web databases. The user must instead browse to the proper database access page and provide a boolean or other description of the desired information, in a manner which is redundant when performed in addition to a similar exercise required for searching the visible web. Finally, it would be helpful to provide the user the means to update a search of both the visible and hidden webs as they grow, without requiring the user to repeat already executed search steps. Moreover, the user would be well-served by a mechanism for differentiating between newly found data and data previously discovered and analyzed by the user.
To address the shortcomings of the available art, the present invention provides an intelligent WAN searching apparatus which resides on a user""s computer. A single subject database (e.g., a healthcare database), or a plurality of single subject databases, are stored on the client and accessed by the application. The majority of the single subject database entries comprises a hierarchical listing of hidden web databases, all entries being organized by subject matter and each including a description of database content and a search term entry interface customized for the particular database access page format. A user may establish a single query that the application then broadcasts to each desired hidden database to obtain indirectly accessible information. The results of the query are cached on the user""s computer and displayed, preferably in HTML format. There are also listings in the database which provide an interface to search engines hosted at a dedicated search server. Each of these search engines includes a subject matter-limited listing of visible web sites that are particularly relevant to the database=s subject. Thus, the user""s query can be broadcast through the dedicated search server to obtain directly accessible information from the visible web. The search results of the visible web sites can then be displayed in HTML format similar to the results of hidden web searches. Each database is preferably updated at a regular interval, such as monthly or weekly, via remote download from a server on the WAN, or by other data transport means.
A plurality of simultaneous hidden database searches may be performed by the client application to the extent connection bandwidth is available for linking the client to the appropriate database access pages and forwarding the user=s desired search information. Preferably, search results from both hidden and visible web searches are cached on the user=s computer for comparison to newly found search results, allowing for easy sorting of new and old data and differentiated display to the user. Desired keywords are preferably cached and shared among database search interfaces.