The present invention concerns a meta-information presentation apparatus equipped with an information management function which allows an information collection apparatus to efficiently collect information.
In recent years, a decentralized hypertext system known as the World Wide Web (“WWW”) has become popular and has proliferated rapidly. The main, public portion of the WWW is carried by the increasingly popular Internet, but smaller private subsets may be formed in LANs and are called Intranets. The growth of the public WWW has been exponential, such that an enormous amount of information is now presented on the WWW. The WWW comprises a plurality of WWW servers which provide information, and clients (termed “browsers”) which are used to access the information. A single WWW server typically manages a plurality of “web pages” joined together by links. A user accesses (or “surfs”) information on the WWW using a browser by following the links to different web pages.
A “search engine” is often used to search information in the web pages on the WWW. A search engine effects a search function by using an information collection apparatus, termed a “web robot”, which collects information provided by a WWW server and then prepares an index on the collected information.
Typically, a web robot collects web page information by accessing all of the web pages on the WWW (managed by numerous WWW servers) one page at a time by following the links in each page. Because WWW server information is updated daily, a web robot must periodically access each WWW server to gather the information required for a search. Heretofore, information has been collected by accessing all web pages regardless of whether the content of the web page has been updated. In other words, the web robot retrieves each and every page each time it is run regardless of whether it had retrieved the page before.
When a web robot sequentially accesses all the web pages that a WWW server manages, a web robot places a large burden on the WWW server by continuously connecting to and accessing WWW pages on the WWW server. At the same time, the web robot collects a great quantity of information and therefore causes increased network traffic. Additionally, when the server stores a great number of WWW pages, an enormous amount of time is required to cycle through all of the web pages, causing a delay in updating the data used by the search engine. Thus, depending on the search engine, it is impossible to search the most recent information.