1. Field of the Invention
The present invention relates to a technique for automatically acquiring desired information via a network and, in particular, relates to a technique for searching through web content offered on the Internet by crawling through links so as to acquire desired information.
2. Description of the Related Art
Recently, computer network environments as represented by the Internet have become widespread. Search engines are generally utilized to retrieve and acquire desired information from enormous amounts of information offered on such networks. Many kinds of search engines are available. If a static search engine is utilized, information is acquired and stored in advance; the stored information is extracted by a user depending on a search request. However, inasmuch that an enormous amount of information sources (web pages, etc.) should be objects to be searched, it is difficult to acquire the latest information using static search engines. Further, since it is assumed that a server having a search engine basically implements all processing, a load of the server is large.
Therefore, a technique has been proposed wherein a set of keyword search results collected by a static search engine is used as an initial set, and relevant sites are dynamically searched using it as the starting point. One known conventional search technique of this kind is a search technique called “Shark-Search”. Discussions of this technique can be found in:    Michael Herscovici, Michal Jacovi, Yoelle S. Maarek, Dan Pelleg, Menachem Shtalhaim, Sigalit Ur. “The Shark-Search Algorithm: An Application: Tailored Web Site Mapping” In the Proceedings of WWW7, the 7th International World Wide Web Conference, Brisbane, April 1998. This article also appeared in the Journal of Computer Networks and ISDN 30 (1998), pp 317-326. HYPERLINK “http://www7.scu.edu.au/programme/fullpapers/1849/com1849.htm”.
The technique disclosed in this literature dynamically searches, based on specified URL (Uniform Resource Locator) and keywords, web sites (web sites with high degrees of significance) that are relevant to the specified keywords, from a web site of the specified URL on the Internet. This system aims to improve the accuracy using two types of keywords, i.e., keywords (Domain Query) for deriving an initial set and keywords (Focused Query) that are used in calculating the degrees of significance of web sites upon dynamically crawling the web sites.
As described above, in order to efficiently search through enormous amounts of the latest information on the network, it is necessary to search dynamically when a search request is made.
However, the foregoing conventional dynamic search engine basically performs a search based on one judgment criterion (relevance ( )) that information is close to a topic (keyword, etc.) specified by a user. Therefore, it has been unable to carry out a search flexibly with a variety of strategies depending on a purpose of use of information.
Further, in order to efficiently search information, it is necessary to judge the degrees of significance of acquiring information (web pages, etc.) and determine an acquiring order and an acquiring range of the information based thereon. However, because of the conventional technique of crawling the web sites based on the URLs and topic on the Internet, it has been difficult to effectively judge such degrees of significance. Specifically, since only limited information for judging the degree of significance of information, e.g., specified keywords and text described in a position near anchors in a web page, are used, it has been difficult to efficiently retrieve desired information. For example, in case of the conventional technique disclosed in the foregoing Herscovic article described above, there is a description that text near an anchor (anchor_text_context) is taken into account for judging the degree of significance of such an anchor, but there is no definite description as to how to obtain that anchor_text_context.