The Internet 104 is a world-wide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another. At the heart of the Internet 104 is the backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, government, educational, and other computer systems, that route data and messages. One or more Internet 104 nodes can go offline without endangering the Internet 104 as a whole or causing communications on the Internet 104 to stop, because no single computer or network controls it. The genesis of the Internet 104 was a decentralized network called ARPANET created by the Department of Defense in 1969 to facilitate communications in the event of a nuclear attack. Currently, the Internet 104 offers a range of services to users, such as e-mail and the World Wide Web.
Vast sets of interlinked hypertext documents 106 are residing on HTTP servers all around the world. These documents comprising the World Wide Web, called Web pages, are written in HTML (hypertext markup language), identified by URLs (uniform resource locators) that specify the particular machine and path name by which a file can be accessed, and transmitted from server to end user via HTTP (hypertext transfer protocol). These Web pages can be searched by a search engine 102 that gathers lists of available Web pages and stores these lists in databases that users can search by keywords. Older examples of search engines include Lycos and Excite. More recent examples of search engines include Google and A9.
Web pages are easy to search on the Internet 104 because many of them are written using a common language, HTML, accessible by agreed upon designation URLs, and communicable via a common protocol, HTTP. Searching on an intranet 108 poses problems typically not seen on the Internet 104. The intranet 108 is a private network based on Internet protocols, such as TCP/IP, but designed for information management within a company or organization. Its uses include such services as document distribution, software distribution, access to databases, and training. The intranet 108 is so called because it looks like a World Wide Web site and is based on similar technologies, yet is strictly internal to the organization and is not connected to the Internet proper. Web pages made available within the intranet 108 can be searched by a conventional search engine 102. As is typical, many documents, such as documents 112, connected to the intranet 108, are not formed using a common language, such as hypertext, but in more specific formats, such as Microsoft Word, Microsoft Excel, and so on. Conventional search engines, such as the search engine 102, are unable to search for pieces of information within documents 112 that are not formed from a common language, such as hypertext.
A similar problem occurs when searching databases, such as a database 110. In a database, data is not associated with a document, such as a Web page. But there are desired pieces of information stored within the database 110 that need to be exposed to users of the intranet 108. Unfortunately, the database 110 lacks any well organized structure to search. Another problem with searching the database 110 is that there are many protocols that can be used to search the database 110. For example, the search engine 102 can use a query language to access the data. As is typical, however, databases, such as the database 110, are not accessed directly, but instead the search engine 102 has to go through various sets of application programming interfaces. Unlike the Internet 104, where the search engine 102 has to know only one protocol, which is HTTP, to communicate with Web pages to extract data, searching the intranet 108 may require the search engine 102 to know multiple protocols that may or may not be appropriate to extract pieces of information from databases, such as the database 110, or application documents 112.
While these problems and others are discussed above in the context of intranet searches, other database searches and document searches have similar, if not identical, problems in heterogeneous environments that are often associated with an intranet. Without resolution to the problem of responding efficiently to users' intranet queries, users may eventually no longer trust a search engine 102 to provide a desired computing experience that can reproduce stored pieces of information, and demand for search engines will diminish in the marketplace. Thus, there is a need for a system, method, and tangible computer-readable medium for responding to intranet queries while avoiding or reducing the foregoing and other problems associated with existing systems.