1. Field of the Invention
The present invention relates to a fact data unifying method and apparatus which extracts a description of a fact within a document, puts the extracted description into a database as a set of data having consistency, and detects or corrects a corresponding error included in an original text based on an inconsistent point of fact data.
2. Description of the Related Art
A variety of methods were conventionally proposed as a technique extracting information within a text. By way of example, for data in compliance with a predetermined framework such as new product information, organism information, etc., a correspondence table between an expression format and data to be extracted within a text is stored, and corresponding data is extracted when a match is found for the expression format stipulated by scanning a text.
Assume that a correspondence table shown in FIG. 1A is stored, and fact data which is composed of a target object, an attribute name, and an attribute value, and is shown in FIGS. 1B and 1C, is extracted. In this example, “a new president of a company C” and “a person D is assigned” respectively match *1 and *2 in the correspondence table. Therefore, “company C” is extracted as a target object, “representative” is extracted as an attribute name, and “person D” is extracted as an attribute value.
If a target is limited to an error on a representation level included in a text, various error correction techniques already exist. By way of example, a method registering an expression included in a text, and pointing to an unregistered word, a method pointing to representation fluctuations, etc. are known.
As described above, fact data extraction from a text is widely used. However, it is not always possible to obtain information desired to view only from the information from one point within a text. Therefore, data from the whole of a text must normally be unified.
Generally, however, data to be extracted includes a considerable number of errors (or data inconsistencies) such as an error included in a text itself, an error in an extraction process, etc., (or data inconsistency). Since errors must manually be checked and removed, or rewritten, data cannot simply be aggregated.