1. Field
The present invention relates to a method and system for capturing semantic knowledge in a knowledge domain, and more specifically, to updating a semantic knowledge base derived in part from documents created by multiple users distributed over a computer network.
2. Background
Semantic knowledge deals not only with the meaning of words, but how these meanings combine in sentences to form concepts that can be described with simple propositions. Creating semantic knowledge base(s) can be valuable for variety of computer based applications. Such applications include billing, decision support, data mining, and speech recognition. In medicine, for example, a semantic knowledge base could provide a means to code reports stored in electronic medical records. After the medical reports were coded, researchers could then determine the prevalence of pulmonary infiltrate in patients with chest x-rays or the frequency of patients needing feeding tubes for serious burn injuries.
While the value of a semantic knowledge base is well recognized, constructing a good knowledge base which adequately represents the knowledge in a specific knowledge domain is a daunting task. In Ser. No. 10/844,912 titled, “Process for constructing a semantic knowledge base using a document corpus”, methods were described to build a semantic knowledge base over a knowledge domain using a document corpus. A corpus is a large number of related documents, typically over 100,000. Propositions can be created that represent the semantic knowledge contained in sentences in these documents as described in Ser. No. 10/844,912. Propositions are distinct from the sentences that convey them, although they are related. For example, the sentences “The chest x-ray is normal” and “The chest x-ray is within normal limits” map to the same proposition or meaning. The advantage of analyzing a corpus using the tools described in (Ser. No. 10/844,912) is that important semantic concepts will not be overlooked. Also different linguistic expressions containing the same underlying meaning will be represented in a consistent manner.
Unfortunately, new knowledge is constantly being created that will not be reflected in a backward looking document corpus. Although in theory one could collect additional documents for semantic analysis, this process can be time-consuming, expensive, and not responsive to rapidly changing knowledge. What is needed is a method to continuously collect new sentences in a specific knowledge domain, characterize the semantic knowledge, and update both the knowledge source and its clients.
In Stier (U.S. Pat. No. 6,560,589) a method for maintaining a knowledge base system was described where authors created knowledge objects, and analysts entered knowledge into the knowledge base. The role of the analysts was to provide authors feedback, review content for technical accuracy, and adherence to conventions and guidelines. The method recognizes the contribution of multiple sources of knowledge, but gives authors and not analysts the primary role in knowledge creation. Stier's method does not specifically focus on the semantic knowledge contained within documents.
Chang (U.S. Pat. No. 6,456,975) described a method to update a speech recognition system for out of vocabulary words or pronunciations. The computer system transmits unrecognized data from a speech recognition program from the particular client to a provider. While this method is useful for this particular application, it does not describe how to update a semantic knowledge base in a particular knowledge domain. The method does not address, for example, how a correctly recognized sentence by the speech recognition program, but not part of the semantic knowledge base, for a particular knowledge domain could be used to increase the size and quality of the knowledge base. Sentences with uncharacterized semantics need to be sent to a knowledge engineer for analysis and possible inclusion in the knowledge base. Since new knowledge could be created by any user, there is a need for methods to quickly capture, transmit, analyze, and distribute this knowledge in a convenient fashion.