1. Technical Field
The present invention relates to an improved method and system for accessing a network database, and in particular to a method and system for efficiently searching a distributed, hierarchical network database, such as the World Wide Web (WWW). More particularly, the present invention relates to improving network search efficiency by distributing search engine functionality via links among various public or private data networks.
2. Description of the Related Art
The development of computerized information resources, such as the Internet, allows users of data-processing systems to link with other servers and networks, and thus retrieve vast amounts of electronic information heretofore unavailable in an electronic medium. The term xe2x80x9cInternetxe2x80x9d is an abbreviation for xe2x80x9cInternetwork,xe2x80x9d and refers commonly to the collection of networks and gateways that utilize the TCP/IP suite of protocols, which are well-known in the art of computer networking. TCP/IP is an acronym for xe2x80x9cTransmission Control Protocol/Internet Protocol,xe2x80x9d and is a software protocol developed by the Department of Defense for communication between computers. The Internet can be described as a system of geographically distributed computer networks interconnected by computers executing networking protocols that allow users to interact and share information over the networks. Because of such wide-spread information sharing, the Internet has thus far generally evolved into an xe2x80x9copenxe2x80x9d system for which developers can design software applications for performing specialized operations or services, essentially without restriction.
Electronic information transferred between data-processing networks is usually presented in hypertext, a metaphor for presenting information in a manner in which text, images, sounds, and actions become linked together in a complex non-sequential Web of associations that permit the user to xe2x80x9cbrowsexe2x80x9d or xe2x80x9cnavigatexe2x80x9d through related topics, regardless of the presented order of the topics. These links are often established by both the author of a hypertext document and by the user, depending on the intent of the hypertext document. For example, traveling among links to the word xe2x80x9cironxe2x80x9d in an article displayed within a graphical user interface in a data-processing system might lead the user to the periodic table of the chemical elements (i.e., linked by the word xe2x80x9cironxe2x80x9d), or to a reference to the use of iron in weapons in Europe in the Dark Ages. The term xe2x80x9chypertextxe2x80x9d was coined in the 1960s to describe documents, as presented by a computer, that express the nonlinear structure of ideas, in contrast to the linear format of books, film, and speech.
The term xe2x80x9chypermedia,xe2x80x9d on the other hand, more recently introduced, is nearly synonymous with xe2x80x9chypertextxe2x80x9d but focuses on the nontextual components of hypertext, such as animation, recorded sound, and video. Hypermedia is the integration of graphics, sound, video, or any combination thereof into a primarily associative system of information storage and retrieval. Hypermedia, as well as hypertext, especially in an interactive format where choices are controlled by the user, is structured around the idea of offering a working and learning environment that parallels human thinkingxe2x80x94that is, an environment that allows the user to make associations between topics rather than move sequentially from one to the next, as in an alphabetic list. Hypermedia, as well as hypertext topics, are thus linked in a manner that allows the user to jump from one subject to other related subjects during a search for information. Hyper-link information is contained within hypermedia and hypertext documents, which allow a user to move back to xe2x80x9coriginalxe2x80x9d or referring network sites by the mere xe2x80x9cclickxe2x80x9d (i.e., with a mouse or other pointing device) of the hyper-linked topic.
A typical networked system that utilizes hypertext and hypermedia conventions follows a client/server architecture. The xe2x80x9cclientxe2x80x9d is a member of a class or group that uses the services of another class or group to which it is not related. Thus, in computing, a client is a process (i.e., roughly a program or task) that requests a service provided by another program. The client process utilizes the requested service without having to xe2x80x9cknowxe2x80x9d any working details about the other program or the service itself. In a client/server architecture, particularly a networked system, a client is usually a computer that accesses shared network resources provided by another computer system (i.e., a server or Internet Service Provider (ISP)).
A request by a user for news or other information can be sent by a client application program to a server. A server is typically a remote computer system accessible over the Internet or other telecommunications medium. The server scans and searches for raw (e.g., unprocessed) information sources (e.g., newswire feeds or newsgroups). Based upon such requests by the user, the server presents filtered electronic information as server responses to the client process. The client process may be active in a first computer system communicating with the server process which is active in a second computer system, over a telecommunications medium, thus providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server.
Client and server communicate with one another utilizing the functionality provided by Hypertext-Transfer Protocol (HTTP). The World Wide Web (WWW) or, simply, the xe2x80x9cWeb,xe2x80x9d includes those servers adhering to this standard (i.e., HTTP) which are accessible to clients via a computer or data-processing system network address such as a Universal Resource Locator (URL). The network address can be referred to as a Universal Resource Locator address. The client and server may be coupled to one another via Serial Line Internet Protocol (SLIP) or TCP/IP connections for high-capacity communication. Active within the client is a first process, known as a xe2x80x9cbrowser,xe2x80x9d which establishes the connection with the server and presents information to the user. The server itself executes corresponding server software which presents information to the client in the form of HTTP responses. The HTTP responses correspond to xe2x80x9cWeb pagesxe2x80x9d constructed from a Hypertext Markup Language (HTML), or other server-generated data. Each Web page can also be referred to simply as a xe2x80x9cpage.xe2x80x9d
The evolution of personal computers over the last decade has accelerated the Web and Internet toward useful everyday applications. The graphical portion of the World Wide Web itself is usually stocked with more than twenty-two million xe2x80x9cpagesxe2x80x9d of content, with over one million new pages added every month. Readily accessible computer software applications such as Internet xe2x80x9csearch enginesxe2x80x9d provide a means for Internet users to track down sites at which information on a topic of interest can be found. A person may type in a subject or key word which the search engine utilizes to locate a list of pertinent network sites (i.e., Web sites) and Web pages. Thus, with xe2x80x9chome pagesxe2x80x9d published by thousands of companies, universities, government agencies, museums, and municipalities, the Internet can be an invaluable information retrieval resource. The market for Internet access and related applications is expanding at an explosive pace.
All search engine applications available today are equipped with a search-and-find facility that is accessed when a user types in a requested search item and xe2x80x9cclicksxe2x80x9d on the application""s xe2x80x98Searchxe2x80x99 button. The data sought may potentially be stored at as many as tens of thousands of Web pages within thousands of network sites. Each of these Web pages may include hypertext links which point to other sites and/or pages at which related information may be found. The process of searching or browsing the Web is therefore an extremely time consuming and computation intensive multiple recursive process possible covering many thousands of possible Web sites and pages.
Conventional search engines maintain internal indices in which the network addresses of Web sites and pages are associated with particular xe2x80x9ckeywordsxe2x80x9d. When a user types in one or more keywords during a Web search, the search engine examines its internal keyword index to determine first whether the keyword is present within the index, and if so, the addresses of the pages at which the keyword(s) is/are located. Given the explosive growth of the Internet as an information repository, storing and updating such an index is proving burdensome both in terms of information storage capacity and computation bandwidth.
From the foregoing, it can be appreciated that a need exists for a method and system for strategically distributing the search engine functionality across rapidly growing electronic data networks such as the Internet. If implemented, such a method and system would improve both efficiency and comprehensiveness of distributed data network searches.
It is therefore an object of the invention to provide an improved information-retrieval method and system.
It is another object of the invention to provide an improved method and system for efficiently searching a distributed, hierarchical network database, such as the World Wide Web (WWW).
It is a further object of the invention to improve network search efficiency by distributing search engine functionality via links among various public or private data networks.
The above and other objects are achieved as is now described. A method and system are disclosed for facilitating a keyword search request initiated at a client station within a multilevel data network, wherein the multilevel data network includes multiple local sites each containing multiple data pages. Multiple keywords from each of the data pages within the local sites of the multilevel data network are stored locally and indexed such that each of the keywords points to one or more of the data pages in which the keywords are contained. The keywords and their index associations are locally updated. A central database is utilized to compile and index the locally indexed keywords from each of the local sites, such that each of the keywords in the central database points to one or more local sites from which those keywords came in response to a keyword search initiated at the client station.