Computers and computer technology have contributed greatly to our personal and professional lives. Computers now help perform many tasks that were previously performed by humans only a few years ago. While computer systems and computer technology have made significant penetration into our lives, some tasks still require significant human intervention.
A recurring problem in the computer field is the difficulty of developing computer systems that can perform tasks that have complex, unpredictable or undefined input data. To date, these tasks typically require human intervention, and often, intervention by scarce human workers that have particular knowledge or expertise. An essential problem is the translation of knowledge and skill of a human expert to a computer system in such a manner that the computer system, when provided with the same fact pattern, reaches the same conclusion or decision as the expert.
The first implementations of such systems used conventional, sequential computers that perform a sequence of operations on a very limited number of data elements, such as an add or compare operation of two data elements. A sequential system that works with large numbers of data elements often requires prohibitively long computation times, even for very fast computer systems.
To reduce the number of data items to be dealt with, “expert” systems have been developed. One form of “expert” system attempts to implement human “expertise” in a number of rules. In a rule based expert system, knowledge engineers attempt to elicit from experts a set of rules that implement the reasoning of the experts when given a set of facts. The rules typically attempt to codify, for example, the knowledge, methodology and reasoning process used by experts to solve a particular problem. The rules are programmed as a sequence of decision steps and, given a fact pattern, the system executes the programmed sequence of rule decisions in an attempt to reach the same conclusion as the expert.
A limitations of such rule based “expert” systems is that much of the expertise expressed in the rules is based on knowledge from a large number of individual fact patterns, which are of necessity and purpose more general than individual cases. This results in a loss of a significant amount of information. In addition, it is often difficult to determine whether the correct set of rules has been implemented, particularly since many experts do not consciously know and understand their own methodology and reasoning processes, and may unconsciously create “rules” that do not in fact reflect their methodology.
Another limitation of many rule based “expert” systems is that a substantial investment of knowledge engineer and expert time is required to determine and implement the appropriate set of rules. Furthermore, if the rules do not produce a desired result, or the input data changes in a material way, rewriting or updating the rules is often an extremely difficult and time consuming process. The rules often interact with one another, and a change in one rule may require corresponding changes in other related rules.
One application where such difficulties have arisen is in generating knowledge repositories (or data bases) from legacy documents. Many companies are currently processing legacy documents for use in automated reasoning systems. In one example, a knowledge repository may be used as a diagnostic fault model for an airplane. In another example, a knowledge repository may describe company business practices.
In order to create a knowledge repository, the legacy documents are typically parsed and the relevant information is identified, often by hand. Some of the information can be easily identified from the document context or by pattern-matching techniques. However, much of the information can be more efficiently and accurately identified by having some level of understanding of the textual meaning of the information in the document. To illustrate this, it is known that some legacy maintenance manuals may include certain fault descriptions. To identify and extract the fault descriptions from the the legacy document, some level of understanding of the meaning of the text within the document would be extremely helpful. For example, the sentence “REPLACE THE GO-AROUND SWITCH, S2 ON THE RIGHT THRUST LEVER” may describe a fault, while “DO THE IRU BITE PROCEDURE (MM 34-21-00, FIG. 107)” may describe a procedure.
It is known that English sentences are extremely complex, and are subject to stylistic variation. Using tediously hand-generated rules that rely on regular expressions to identify the fault statements would be extremely difficult. In general, the more complex the pattern, the more difficult it is to write a regular expression to recognize the pattern. Thus, not only would it be difficult to generate rules that rely on regular expressions to identify fault statements with a legacy document, but such a system would likely be dependent on the writing style of the author, making it difficult to transfer the hand-generated rules and/or regular expressions to different legacy documents or even to different chapters within the same document. Thus, such systems may be highly brittle and error-prone. Because of the foregoing, there is often a large cost associated with creating knowledge repositories from legacy documents.
The field of search engines is another application where it is often desirable to identify and categorize certain information within documents. Search engines typically accept a user specified search expression, and compare the search expression to text in selected documents, databases or web pages. Using rules or regular expressions to identify and categorize text within documents can be difficult, time consuming, and tedious.