1. Field of the Invention
The present invention relates to an apparatus and method assisting the administration of location information such as a URL,etc.
2. Description of the Related Art
Conventionally, most Internet users use a function for registering a URL (Uniform Resource Locator) of their favorite site, or a particular Web page that they frequently access, and for facilitating an access next time. Such a function is called a “bookmark”, “favorites”, etc. Hereinafter, the “bookmark” is taken as an example.
Incidentally, a URL varies very frequently, and becomes obsolete shortly. Its typical example is that a Web page indicated by a URL disappears or moves (what is called, a disconnection of a link). Additionally, contents of a Web page indicated by a URL is outdated, and becomes unhelpful in some cases.
Furthermore, although a helpful Web page exists other than a Web page registered by a user, it is not registered because the user does not know its existence. This can also be the obsolescence of a bookmark.
Conventionally, to overcome such a problem, for example, “blink (http://www.blink.co.jp/)” renders the following service using collaborative filtering. The collaborative filtering is a known technique that records information of a liking of a user, and estimates a liking of the user based on the liking information of a different user whose liking is similar to the user. It is known that the more the number of pieces of liking information of each user, or the larger the number of users, the more accurate estimation becomes.
With the above described “blink”, each user registers his or her bookmark to a server. The server periodically makes a comparison between each registered bookmark and a bookmark registered by a different person, finds a pair of bookmarks having the same URL, and extracts a difference (a URL that is registered to one bookmark, but not registered to the other) for each pair, thereby rendering the service that can keep the bookmark of an individual up to date.
The above described blink service is effective when bookmarks of a sufficiently large number of users (more than several ten thousands of users) can be registered/used. However, if the number of users is small (on the order of several tens to several hundreds of users), the blink service does not function well despite being applied. This is experimentally/empirically proved.
Additionally, the probability that a difference from a bookmark of a different user is a URL contents of which is helpful is not always high.
Furthermore, a service for providing helpful information located on the Internet or an intranet by classifying and organizing the information as a hierarchical category (hereinafter referred to as a document directory) (such as Yahoo!, goo, ODP (Open Directory Project), etc.) is conventionally implemented.
With such a document directory/service provider, a URL to be registered to a document directory on the Internet or an intranet is manually found and registered to a suitable category. However, the number of URLs is very large at present, and continues to be increasing. For a certain provider, as many as several hundreds of persons perform the above described registration operation. However, their workload is heavy.
Furthermore, many web servers have been operated on an intranet, etc. within an organization such as an enterprise, etc. in recent years, and the demand for building a document directory for an organization has been increasing, similar to Yahoo!, etc. on the Internet. Because maintenance/administration of such a document directory for an organization is made by a small number of persons in many cases, their workload is also heavy even if the scale is relatively small.
Therefore, it is demanded to reduce workload by allowing even part of the above described document directory maintenance/administration operations to be automated, and to reduce the maintenance/administration cost.
As a technique for automatically structuralizing a document directory, a method using clustering is conventionally known. This is a method creating a set so that documents having similar contents are included in the same set. A representative keyword of each set is defined to be a category name. To create a hierarchical structure of sets, a method defining “a set of sets” as a higher-order hierarchy according to the similarity among sets, and a method using the relationship among keywords included in sets are proposed. However, these methods are inferior to manual structuralization in quality. For a commercial directory, these techniques are only used as an auxiliary means.
The present inventor has already proposed “Automatic Collection for Document Directory Administration, Automated Classification Using/URL Automatic Recommending System”, Takanori UKAI, Yoshinori KATAYAMA, Hiroshi TSUDA; 5C-01 National Convention of Japanese Society for Artificial Intelligence (2001).
This proposed technique is mainly composed of a process for automatically collecting a good URL to be stored in a directory from the Internet or an intranet, and a process for allocating a document (home page) indicated by the good URL to a suitable category. The process for automatically collecting a good URL takes an approach based on contents, link, or log. However, since the approach based on contents/link has problems that calculation cost is required, and a URL determined to be good is apt to be a URL which indicates particular contents, etc. Therefore, the approach based on log is adopted. With this approach, a technique extracting a keyword used with high frequency from a search log, performing a search by using this keyword, and selecting a URL associated with this keyword is proposed.
With this technique, however, a satisfactory correct answer rate has not been obtained yet, and it is desired that a higher correct answer rate can be obtained (a correct answer of the correct answer rate means that each URL can be registered to a suitable category. Inversely, an incorrect answer indicates the case where an unsuitable URL is recommended to a certain category if the certain category is taken as an example (for instance, a URL whose contents is associated with “aircraft” is recommended to a category “car”), the case where a URL to be originally recommended to this category is not recommended, or the like.