The present invention generally relates to news clipping systems which use query expressions previously registered by users to search for any one of texts acquired from a plurality of news sources such as news agencies and news paper publishing companies through electronic mail or information collection robot, and distribute the texts to the user associated with the satisfied query expression. The present invention particularly relates to a news clipping system having a fast, instant text-retrieval and-distribution function capable of searching out, by once scanning, all the texts which the users need even though the number of users is increased.
Recently, a large quantity of electronic document (hereinafter, called text) has been distributed every moment to the user by means of electronic mail, electronic news or others. In addition, the number of news sources for supplying information through the Internet has been increased, and thus an information collection robot or the like is required to collect an enormous amount of text from these news sources. Therefore, there is an imminent need to provide a news clipping system for instantly distributing the related text to the user.
The core of this news clipping system is the document retrieval, which is specifically described in "Efficient String Matching: An Aid to Bibliographic Search", A. V. Aho, et al., communications of the ACM, June 1975, Vol. 18, No. 6, pp. 333-340.
This paper describes a kind of finite automata called pattern matching machine constructed from strings of keywords to be searched for (hereinafter, called query terms). This matching machine is able to locate all occurrences of any of a finite number of query terms in an arbitrary text string by once scanning. However, there are the following problems when texts are searched in accordance with the query expressions which a number of users request.
(1) User Identification Problem
If one finite automaton is constructed from all query terms included in the query expressions which a large number of users provide, all the query terms can be searched out by once scanning the texts. However, since it is uncertain which user's query expression contains the query terms coincident with strings of text, it is not possible to discriminate the satisfied user's query expression from the others.
(2) Process Time Problem
If a finite automaton is constructed from the query terms included in the query expression of each user, it is possible to discriminate the satisfied query expression from the others. However, since the texts must be scanned as many times as the finite automata number (namely, the number of users), the increase of the user number will make the retrieval take the more time.