The present invention relates generally to data processing systems and, more particularly, to an improved fact recognition system.
Conventional fact recognition systems recognize facts contained in input data and populate a data store, like a database, with the recognized facts. As used herein, the term xe2x80x9cfactxe2x80x9d refers to a relationship between entities, such as people, places, or things. For example, upon receiving the input data xe2x80x9cJohn Smith is the president of XYZ Corp.,xe2x80x9d a fact recognition system identifies the fact that the president of XYZ Corp. is John Smith and stores this fact into a database. Thus, fact recognition systems automatically extract facts from input data so a user does not have to read the input data.
To recognize facts, conventional systems utilize rules. An example of one such rule follows:
 less than person-name greater than  a|the  less than job-name greater than  of  less than company-name greater than 
This rule is used to extract the fact that a person holds a particular job at a particular company. These rules are created by knowledge engineers, experts in the field of fact recognition. The knowledge engineers generate a large number of rules, and the system then applies these rules to a stream of input data to recognize the facts contained therein. If any part of the input stream matches a rule, the system extracts the fact and stores it into the database. Although conventional systems provide beneficial functionality by storing facts retrieved from input data, these systems suffer from a number of drawbacks because (1) very few knowledge engineers exist who can create the rules, (2) the development of the systems takes a long time as rule creation is a very tedious and time-consuming task, and (3) the systems are not very accurate in recognizing facts. It is therefore desirable to improve fact recognition systems.
In accordance with methods and systems consistent with the present invention, an improved fact recognition system is provided that automatically learns from syntactic language examples and semantic language examples, thus facilitating development of the system. The language examples are rather simplistic and can be provided by a lay person with little training, thus relieving the need for knowledge engineers. Furthermore, the learning performed by the improved fact recognition system results in a collection of probabilities that is used by the system to recognize facts in a typically more accurate manner than conventional systems.
In accordance with methods consistent with the present invention, a method is provided in a data processing system. This method receives syntactic language examples and receives semantic language examples. Furthermore, this method creates a model from both the syntactic language examples and the semantic language examples and uses the model to determine the meaning of a sequence of words.
In accordance with methods consistent with the present invention, a method is provided in a data processing system. This method receives a collection of probabilities that facilitate fact recognition, receives an input sequence of words reflecting a fact, and identifies the fact reflected by the input sequence of words using the collection of probabilities.
In accordance with systems consistent with the present invention, a computer-readable memory device encoded with a data structure is provided. This data structure contains a collection of probabilities for use in recognizing facts in input data.
In accordance with systems consistent with the present invention, a data processing system is provided that comprises a memory and a processor. The memory includes a statistical model with probabilities reflecting likely syntactic structure for sequences of one or more words and likely semantic information for the sequences. The memory also includes a training program for generating the statistical model and a search program for receiving a sentence reflecting a fact and for using the statistical model to recognize the fact. The processor runs the training program and the search program.