1. Field of the Invention
The invention generally relates to a method and system for processing data, in particular, to a method and system for processing text by using object coreference technology.
2. Description of Related Art
Data mining is a step in database knowledge discovery. Data mining generally refers to a process for automatically searching in a large amount of data for information hidden therein that has special relationality. Data mining and data analysis are important research subjects in field of information technology, upon which many sub research subjects exist. Information extraction research in natural language processing technology has provided people with a more powerful information retrieving tool to cope with the severe challenge brought by information explosion. Information extraction technology does not attempt to comprehensively understand the whole document; it simply analyzes the part containing relevant information in the document. Object coreference technology is one type of application in information extraction research, which can confirm coreference relationship in text in some degree.
In current natural language processing technology, object coreference is mainly used to analyze character coreference. For example, for a segment of text “mayor zhang visited the newly-built museum today . . . the mayor talked with staff of the museum with interest . . . he carefully asked relevant situation . . . ”, traditional natural language processing technology can determine that “mayor zhang”, “mayor” and “he” refer to a same person. U.S. Pat. No. 6,438,543 B1 has disclosed a method of retrieving a same subject having different names in multiple pieces of article. Specification of that patent has specifically introduced how to judge that Clinton at different positions refers to a same person.