The present invention relates to a device and method for filtering information, in which information requested or interested by a user is selected in a great number of documents such as text descriptions and literature and is supplied to the customer at regular intervals.
The present invention relates to a device and method for monitoring updated document information, in which an updated document is detected in at least one document specified by a user in advance and is supplied to the user thereof.
The present invention relates to storage medium used in the information filtering device and the updated document information monitoring unit.
Recently, computerization of documents has been progressing in an accelerated manner, together with the popularization of word processors and electronic computers and the wide spread of electronic mail and electronic news through a computer network such as the Internet.
As a term "electronic publishing" suggests, it is conceivable that information included in a news paper, a magazine and a book will be commonly provided for us by electronic means. With such electronic means in future practical use, it is also foreseeable that text information available at an individual user site in real time will increase more and more to a tremendous volume.
In such circumstances, a demand is increasing for an information filtering system or an information filtering service, in which information requested or interested by a user is selected and supplied to the user.
With consciousness to regard the above mentioned issues as critical, development of information filtering devices have been recently started, which provide each user with only information that meets retrieval condition set in advance for the user.
In these information filtering devices, however, there has conventionally been no need for consideration on revision of a newly produced document, since periodical documents such as descriptions in a news paper and a magazine have been objects for processing.
For example, since a news paper is daily issued, it is only required to process descriptions on events having occurred on the day. In publication in the form of CD-ROM, regularly or irregularly, in which some number of events are collected, the information filtering devices has been only required to process the information included in the CD-ROM.
Such a situation is applicable for filtering documents on which information on a date when a document was created is explicitly described. That is, in this case, documents the creation dates of which fall in a predetermined date interval are only selected to be processed in reference to the date information and such filtering processing can be exercised with ease. In the case where creation dates and revision dates are stored as auxiliary information, too, a similar processing can be exercised.
On the other hand, there are present kinds of documents on which no information on dates of creation or updating is available as description, in files of which even no auxiliary information is stored or moreover in creation of which no rules is decided. For example, documents (called Web pages) publicized in WWW (World Wide Web) are created by individuals under no control.
Such Web pages are created whenever an individual wants to and there is no rule available in which dates of creation or updating should be described on a document. For this reason, it is absolutely difficult to constantly obtain date information with a high reliability on when the document was created or updated on all the documents.
In other words, in a conventional information filtering device, there has been a serious problem that it is difficult to select and provide information interested by an individual among information, created or updated, which is the object of the conventional filtering device.
In such a manner, a conventional information filtering device has difficulty in constantly acquiring date information on when a document was created or updated on all the documents and therefore there has been a problem in the device that it cannot be effected to distinguish between a created document and an updated one.
In recent years, there has been a conspicuous propagation of the Internet and information stored in computers scattered all over the world can be with ease accessed from anywhere, only if a connection with the Internet is in order.
In WWW (World Wide Web), an arrangement is provided that a user can access with ease all the information all over the world by a GUI (Graphical use Interface) base browser under the use of HTTP (Hyper Text Transfer Protocol).
A software called httpd is used a computer in WWW. The software is to transfer a hypertext file which is described in HTML (HyperText Markup Language) stored in a database of the computer according to a request from another computer.
A computer connected to the Internet can read a specified file by addressing a hypertext file in the httpd in which the hypertext file requested for transfer is present.
Since, in a description of HTTP, the address is described as link information, a browser in conformity with the protocol of HTTP can display a hypertext file under a control of the httpd.
Besides, if various data such as a voice, a still image and a dynamic image can be displayed, a hypertext including multi-media data can be also displayed with a browser.
With such an arrangement, a user can be more easily access information on the Internet and many of individuals and enterprises have become to publicize hypertext files called Web page.
In WWW, however, no manager of the database is available and an individual creates and updates Web pages by its own way. Because of greatness of the volume of accumulated Web pages (A total of the web pages already publicized in the world is estimated to have been 40 million at the beginning of the year 1996.), we have been already in a hard situation in which an individual user cannot find a way how to know where a Web page it wants to is (or what should be addressed as a URL address in order to attain a Web page it wants to). Moreover, frequency and time zone of update are not regulated being at its disposal.
Under such circumstances, very recently, there have been appeared a new service in which retrieval is conducted by a specialist instead of a user, since a system in which Web pages accessible are obtained with a content base has been developed.
In a concrete manner, there are available Web retrieval servers such as Yahoo, Lycos and Altavista. With a Web retrieval server, a Web page including a key word can be obtained by addressing a key word. A user obtains a Web page which he or she want to using a Web retrieval server.
In such a manner, while information which is necessary can be with ease obtained on line by using a Web retrieval server, such information can be only obtained when a user actively addresses the necessary information for retrieval, but if the user does not effect addressing when information, in which it is interested, is newly created, it cannot have a chance to access to the information, even though it is important to the user.
Accordingly, a system is necessary in which a user is supplied to a fact that information has been created when it was created.
In a conventional database, such a function is called SDI (Selective Disseminative Information). When the SDI is used, a user has to enter a key word or the like into the system directory library as an individual profile which key word is used for selection of information the user is concerned with or interested in.
The system compares the key word and the like (profile) with data when the data are newly entered into the system and if the data coincides with the key word, the system informs the user of that new information which the user wants to have has been created.
However, since it is free for any one to entry any information into WWW, it is conceivable to enter a plurality of units of information in one Web page.
When such a Web page where a plurality of units of information co-exists is handled as one processing unit and compared with a profile, there is not necessarily a guarantee for a proper filtering.
Therefore, there has been a problem that even a Web page including important information in part, which the user has a concern with and an interest in, cannot be selected for the user, since the Web page is judged as a whole in regard to whether to be taken up or discarded.
In a conventional database, since individual data are in a local environment or managed by a specified manager, it has been easy to distinguish newly created information over existing information, but, in WWW, since an individual can enter a Web page into the system independently and no manager manages all the WWW, there is a great difficulty in discern new information from existing information.
Furthermore, a Web page has a hypertext structure and a plurality of Web pages which are mutually related to express a piece of complete information and thereby there has been a problem that it is not sufficient only to detect creation of new information in monitoring pages.
Still furthermore, there has been a problem that it is hard for a single system to monitor newly created information throughout a very wide scope such as Web pages on WWW.
As seen from the above description, when a conventional information filtering is applied to Web pages on WWW, there have been the following problems:
(1) There are two cases, in one of which one Web page consists of a single piece of information and in the other of which one Web page includes a plurality of information pieces. In the latter case, the information pieces have to be divided into as many pieces and comparison has to be effected between a profile and each divided piece in order to select pieces of necessary information with accuracy.
(2) When a system is not large in scale, the system cannot singly check all the pages in the world. On the other hand, even with a small scaled system, it can offer convenience to a user if a monitoring means to detect update made on a specified page is adopted.
However, a Web page is a hypertext and thereby there is a chance to make a complete piece of information by combination of information pieces of a plurality of Web pages. Therefore, if the monitoring means can specify only one Web page at a time, pages such as children and grand children derived from a parent page with links therefrom are not detected, even if they are updated.
(3) It is impossible to monitor new information sufficiently with a single information filtering device.
Since there is a necessity of a system which can inform a user of creation of information the user wants to have, as described above, a conception on an information monitoring device has come to be proposed that a desired set of pages are selected by a user in advance among a tremendous number of web pages and information in a scope of the selected Web pages is monitored and pieces of new information are detected to be notified to the user.
Even with such an information monitoring device, however, monitoring is only conducted in each page. Web pages have a feature that they are linked between documents to form a hypertext. That is, each of Web pages has a small value of information and they can be a complete piece of information having a significance as a group of documents in respective pages. Therefore, since Web pages have been monitored independently, no effective update detection has not been able to be effected.
In a Web page, as shown in FIG. 61, there is a description in HTML and it is a protocol that references to other documents are effected by setting an address of the document stored, as shown in (a) of the figure.
In a conventional update monitoring for Web pages having such a structure, there are the following problems:
(1) It has been impossible to specify documents to be monitored on whether information is updated or not as a group and also impossible to notifye a user of updated documents as a group.
(2) Updating of a Web page has been effected irregularly. Therefore, if a check is regularly performed, updating can be only detected in a period of checking.
(3) In the present condition, monitoring has to cover Web pages being not updated, which is meaningless.
(4) In the present condition, monitoring has to cover Web paged already deleted, which is also meaningless.
(5) While a storage location of a Web page is changeable, when it is the case, it is necessary to change a monitoring location to a new storage location.
(6) While Web pages are referred to among themselves and such information are important, the information cannot be technically notified to a user.