(1) Field of the Invention
The present invention relates to a document processing system for storing input documents after subjecting the documents to a predetermined process and for retrieving or clipping documents matching a given query from the stored documents, and to a recording medium recording a program for causing a computer to perform such processes.
(2) Description of the Related Art
With recent popularization of the Internet and an increasing number of full-text databases, information available to individuals is drastically expanding.
To acquire desired information from among such a vast amount of information, a method is generally adopted in which a retrieval process, clipping process or the like is performed using, as a key, search terms (query) describing features of data to be obtained, for example.
With conventional large-scale commercial on-line databases or full-text retrieval systems, however, if the condition of search terms is loosened, noise (unneeded data) included in the search results increases; conversely, if the search condition is narrowed, search omission may result, giving rise to a problem that it is difficult for the user to acquire desired data.
Specifically, in a document culling or narrowing process or a document retrieval process adopted in conventional document filtering, ranking retrieval based on the degree of coincidence or relevancy between the query and document contents is conducted at best, and accordingly, it is difficult to carry out document culling that fully reflects the importance of information included in documents or the user""s purpose of performing search.
Consequently, even in the case where the user desires to search for an organization named xe2x80x9cHashimotoxe2x80x9d, for example, documents including xe2x80x9cHashimotoxe2x80x9d as a name of place are very often retrieved.
Also, when new products priced in the 200000 to 299999 yen range are to be searched for, it is necessary to use a query which is created taking account of every possibility like xe2x80x9ctwo hundred thousand yenxe2x80x9d, xe2x80x9c200,000 yenxe2x80x9d, xe2x80x9ctwo hundred ten thousand yenxe2x80x9d and xe2x80x9ctwo hundred fifty thousand yenxe2x80x9d.
Further, although it is possible to search for documents by specifying a document creation date, date information included in documents cannot be utilized for search.
In the following sentences, for example, xe2x80x9cthe 1stxe2x80x9d means different days, though the words used are the same.
(a) On the 1st, Corporation A will release Product B.
(b) On the 1st, Corporation A released Product B.
If the sentences were created on Feb. 15, 1997, xe2x80x9c1stxe2x80x9d means Mar. 1, 1997 in the case of (a), and means Feb. 1, 1997 in the case of (b).
The conventional method is thus associated with a problem that it is difficult to recognize the attributes of date information in documents and to use (utilize) such information for search.
The present invention was created in view of the above circumstances, and an object thereof is to provide a document processing system capable of performing document retrieval or document culling that fully reflects the user""s purpose of performing search.
It is another object of the present invention to provide a recording medium recording a document processing program for performing a document retrieval process or clipping process that fully reflects the user""s purpose of performing search.
FIG. 1 illustrates the principles of the present invention for achieving the above objects. The present invention provides a document processing system for storing input documents after subjecting the documents to a predetermined process and for retrieving or clipping documents matching a given query from the stored documents, the system comprising knowledge information storing means 3, event specifying means 4, attribute value extracting means 5, correlating means 10, document storing means 11, and document extracting means 12.
The knowledge information storing means 3 stores knowledge information necessary for processing an input document. The event specifying means 4 specifies the type of an event described in the input document by looking up the knowledge information stored in the knowledge information storing means 3. The attribute value extracting means 5 extracts, from the input document, attribute values of attributes relating to the event specified by the event specifying means 4 by looking up the knowledge information stored in the knowledge information storing means 3. The correlating means 10 correlates the attribute values extracted by the attribute value extracting means 5 with entities in the real world by looking up the knowledge information stored in the knowledge information storing means 3. The document storing means 11 stores the attribute values correlated by the correlating means 10 and the input document or information specifying a storage location thereof in a manner associated with each other. The document extracting means 12 looks up the attribute values and a query to retrieve or clip target documents.
The knowledge information storing means 3 stores events, attributes relating thereto, and information for extracting attribute values constituting the attributes, in a manner associated with one another. The event specifying means 4 collates an input document with the knowledge information stored in the knowledge information storing means 3, to thereby specify an event described in the document. The attribute value extracting means 5 refers to the knowledge information storing means 3 and extracts attribute values of attributes relating to the specified event from the document. The correlating means 10 correlates the extracted attribute values with entities in the real world into one-to-one correspondence by looking up the knowledge information stored in the knowledge information storing means 3. The document storing means 11 stores the thus-correlated attribute values and the document or information specifying a storage location thereof in a manner associated with each other. The document extracting means 12 collates information included in an input query with the attribute values stored in the document storing means 11, to extract desired documents.
Thus, the contents of documents are grasped in terms of event, and information generated by extracting attribute values of attributes constituting the grasped event and correlating the extracted attribute values with entities in the real world is looked up to retrieve or clip documents, whereby the retrieval or clipping accuracy can be improved.