Nowadays, there have been widely used systems in which a plurality of server computers and a plurality of client computers are interconnected by a network, multimedia data having a hypertext structure is stored in each server computer, and such hypertext can be read on each client computer with software called a browser. An example of such systems is a system called the WWW (World Wide Web) on the Internet.
Documents containing multimedia data (hereinafter referred to as the “hypertext”) are described in, for example, a description language called HTML (Hypertext Markup Language), and can include application programs such as text documents, still images, moving images, music data and a Java® APPLET. A unique address called URL (Uniform Resource Locator) is allotted to such hypertext. A user can access a desired hypertext by specifying a URL through a browser. Moreover, a hyperlink (hereinafter referred to as the “link”) for accessing other page and multimedia data is embedded in hypertext. By pointing this link on the browser, the user can move to the linked hypertext.
A lot of tag information indicative of the characters and attributes of data is defined in the HTML. As an example of the tag, there is a format <a href=“URL”>anchor character string</a> indicating a link. This format can specify the address of a linked point by a URL by starting with <a> and closing with </a>. By pointing such an anchor character string to which a link is attached on the browser, hypertext at the URL can be read.
Regarding the browsers as HTML display software for the client computers, for example, products such as Netscape Communicator® available from Netscape Communications Corp. and Internet Explorer® available from Microsoft have been widely used and operated on computers in which various operation systems are installed. With these browsers, it is possible to store the name of the URL of the accessed hypertext, the accessed date and time, and the title indicative of the contents of the URL on the hard disk in the client computer. This information will be presented as the “History” to the user for reuse.
The number of pages of hypertext has been increasing exponentially worldwide, and the user usually registers the URLs of important hypertext among those once read in a so-called Bookmarks file in the browser by using the above-mentioned History. Specifically, the bookmarks selectively display the proposed accesses points. When the user wants to access hypertext which has been accessed in the past again, he/she can access desired hypertext easily by referring to the bookmarks.
However, the number of URLs recorded in the Bookmarks file has been increasing, and the Bookmarks file is flooded with URLs in which the user is no longer interested, unnecessary URLs and URLs which are no longer exist with the passage of time. Therefore, the user has to carry out an increasing number of time-consuming works to keep only the URLs in which the user is interested in the Bookmarks file and confirm with eyes whether each hypertext contain new information by visiting the remaining URLs.
Then, for instance, in the browser “Netscape Navigator®” of Netscape Communications Corp., whether the URLs recorded in the Bookmarks file have been changed is automatically inspected by comparing with the previous accessed date, and a URL where the information has been updated will be presented with a check mark to the user. Thus, the user can readily know the URLs containing a change. However, the contents and location of the change at the URL are not displayed. Hence, when a large number of URLs in the Bookmarks file have been changed, it is necessary to access all of these URLs and see the state of the changes with eyes for confirmation.
In order to make an improvement in such a circumstance, there is new link detection agent software which automatically monitors hypertext to detect and display the state of changes. As known examples, agent software capable of detecting new links of hypertext is disclosed in “Internet Agent” (Far-Chun Cheong, translated by Hiroyuki Ohno, sold by Impress Corporation, ISBN-8443-4921-X), Chapter 7 “WebWalker: Your Web Maintenance Robot”. In addition, there is other commercially available software, such as WebWhatsNew® of AI Soft K.K. in Japan. Furthermore, Japanese laid-open patent publication (Tokukaihei) No. 10-222415 (published date: Aug. 21, 1998) discloses a similar technique.
Such new link detection agent software includes a database of a collection of links of a certain URL for each URL, compares the links with the links in the previous access when performing access, detects a new link or a change in the anchor character string to which a link is attached, and presents the result to the user. In this case, the URLs to be monitored are those in the user's Bookmarks file or those manually specified by the user.
Regarding the above-mentioned tag, it is possible to use a self-defined extended tag such as XML (Extensible Markup Language). For instance, in the file object “http://www.sharp.co.jp/mebius.html”, if the <price> tags, i.e., <price>200,000yen</price>, are self-defined and interpreted that the anchor character string (200,000yen) enclosed by the <price> tags represents the price of the product, it is possible to create agent software which automatically confirms whether the price is lowered from a change in the anchor character string. This is disclosed, for example, in the article “Application of XML to business has started” on the magazine “Nikkei Internet Technology” (the May 1999 issue, p 82–89).
However, in a method of noticing the appearance of a new link and an update of hypertext containing a change in the anchor character string by the above-mentioned new link detection agent software, the URLs subjected to monitoring are fixed, for example, those in the Bookmarks file, and the user needs to clearly specify the URLs. Therefore, when a number of URLs recorded in the Bookmarks file are monitored with the use of the bookmark as the source of the URLs subjected to monitoring, there is a possibility that too many URLs are monitored and URLs which were helpful for the user in the past but are no longer necessary are monitored. Thus, it takes a long time to monitor the update of hypertext, and the results of monitoring contain a lot of unnecessary information.
To reduce the number of URLs subjected monitoring, some methods have been implemented or proposed as described below. For instance, in the Netscape Communicator, the URLs of hypertext accessed in the past can be sorted according to the frequency in use or the date and time, and then displayed. Moreover, for example, Japanese laid-open publication (Tokukaihei) No. 10-143519 discloses a method and device which sort the URLs accessed in the past according to the frequency in use or the time of reading and listening, and display the results.
Besides, for example, Japanese laid-open publication (Tokukaihei) No. 9-204347 (published date: Aug. 5, 1997) and No. 10-21134 (published date: Jan. 23, 1998) disclose methods of updating a cache. In these methods, when a relay cache is incorporated into a gateway computer which relays URLs between the server computers and the client computers, the frequency the URLs relayed in the past is calculated and a list is created, and the gateway computer voluntarily updates the cache in order of the frequency.
A common feature between these systems is that these systems pay attention to the frequency the URLs of hypertext accessed in the past, calculate the frequency by statistical processing, and judge that hypertext which was accessed frequently is highly important to the user.
Here, suppose a system formed by a plurality of server computers providing information, a gateway computer and client computers. The gateway computer is a computer for interconnecting different networks/systems.
For instance, as shown in FIG. 21, suppose an HTML page of the URL “http://www.news/” exists on a server computer and an HTML page of the URL “http://www.hello.nara/” exists on another server computer.
The HTML page of the URL “http://www.news/” is a page providing news information (Daily Newspaper), and provided with links to five HTML pages “New Publication (“http://www.news/1.html”)”, “Whether Report (“http://www.news/2.html”)”, “New Products of Company A (“http://www.news/3.html”)”, “New Products of Company B (“http://www.news/4.html”)”, and “New Products of Company C (“http://www.news/5.html”)”.
Moreover, the HTML page of the URL “http://www.hello.nara/” is a page providing Nara prefecture local information, and provided with links to two moving image data “Stock Information (“http://www/a.mov”)” and “Traffic Information (“http://www/b.mov”)” and to the HTML page of “Notice (“http://www/index.html”)”.
Here, assume that the user first accesses the HTML page of the URL “http://www.news/” through the client computer, reads the five HTML pages (text data) linked to this page, and then accesses the HTML page of the URL “http://www.hello.nara/” and reads the two moving data and one HTML page (text data) linked to this page.
At this time, the access log shown in Table 1 below is recorded in the gateway computer.
TABLE 1ResponseURLcodeContent-typeTitle informationhttp://www.news/200text/htmlDaily Newspaperhttp://www.news/1.html200text/htmlNew Publicationhttp://www.news/2.html200text/htmlWhether Reporthttp://www.news/3.html200text/htmlNew Products of ACompanyhttp://www.news/4.html200text/htmlNew Products of BCompanyhttp://www.news/5.html200text/htmlNew Products of CCompanyhttp://www.hello.nara/200text/htmlNara PrefectureLocal Informationhttp://www.a.mov200movieStock Informationhttp://www.b.mov200movieTraffic Information
In Table 1, the HTML page of the URL “http://www.news/” is counted only once for the following reason. For example, assume that the user moves from the HTML page of the URL “http://www.news/” to a linked page of New Publication (“http://www.news/1.html”) to read this page, presses the Back button in the browser to return to the page of “http://www.news/”, and moves to the other four linked pages in a similar manner. At this time, in the browser, a certain amount of cache (a memory for temporarily storing information for high-speed processing) is usually stored, and the data of the page “http://www.news/” is retrieved from this cache when returning to the page of “http://www.news/” from the page of “New Publication (“http://www.news/1.html”)”. In this case, since a request for obtaining the URL is not transmitted to the gateway computer, the HTML page of the URL “http://www.news/” appears only once in the access log of the gateway computer.
According to the results of counting shown in Table 1, since all of the URLs appear once, it is judged by the above-mentioned method of judging the degree of the user's demand from the frequency of appearance that the degree of the user's demand for each of these URLs is the same. However, in actual fact, if the user accesses again the HTML page of the URL “http://www.news/” or the URL “http://www.hello.nara/” later and a new link is created, it is predicted that there is a high possibility that the user will perform an access operation to access the new link. At this time, it is supposed that there is almost no possibility that the user will access the already accessed URLs “http://www.news/1.html” and “http://www.news/2.html”.
In other words, when each of three file objects of URLs whose hierarchical order is one level lower than a URL representing the referring address of certain hypertext is read once through links, if the degree of importance of each of the URLs of the three file objects is 1, it is the same as the degree of importance when a file object of a URL at a lower hierarchical level is read through a link from the referring address of other hypertext. However, since the total of the number of times the former hypertext have been browsed is 3, the former text is often more important than the latter hypertext.
Hence, even when an update of hypertext is detected from a selected URL as described above by the new link detection agent software, URLs which are important to the user are not sometimes detected. Moreover, if new links or changed anchor in file objects at a lower hierarchal level are detected, too many pieces of information are provided. It is therefore difficult to present the results effectively to the user and display the results within a limited information display space.
Thus, when hypertext is arranged in a multi-stage tree structure, the degree of importance of a URL is determined by calculating the frequency of access to the individual URLs of file objects at a lower hierarchical level. Hence, the degree of importance of the URLs does not reflect the true importance to the user.
In addition, hypertext may include an anchor having a link only to an image file. This is called a banner advertisement, and an example of the format in the HTML and a display example by a HTML browser are shown in FIGS. 22(a) and 22(b), respectively. In these examples, an advertisement link is attached to the advertisement image file “http://ad.banner/banner.gif”. This banner advertisement has different anchor URLs for the respective accesses, and is sometimes detected as a new link by the above-mentioned update detection method of hypertext. Consequently, hypertext in which only the banner advertisement has changed is also detected as the updated hypertext and hence the unnecessary information provided to the user increases.
In order to distinguish a banner advertisement so as to exclude the banner advertisement from the objects of detection, for example, there is a method as introduced in the Internet magazine (published by Impress Corporation, June 1999 issue, p. 249) in which an image file having a domain different from a server computer to which hypertext including an embedded advertisement image belongs is searched out by Web server access software called the WebBooster Ninja (trade name) of i4 Corporation. In this method, however, it is impossible to distinguish the image file of the banner advertisement belonging to the same server computer as that of hypertext in which the advertisement image is embedded.