1. Field of the Invention
The present invention relates to a system for knowledge management and automation. More particularly, the invention relates to a system for knowledge management using natural language sentences, and even more particularly to a computer software system for knowledge management using natural language sentences that state the facts and imperatives that define how an entity behaves or operates and what the entity needs to know in order to so behave or operate.
2. Description of the Prior Art
Natural Language Systems. Natural language interfaces to SQL databases have existed for some time. The ability to query a data model using natural language is particularly beneficial to non-technical business personnel who manage operations that produce and/or are affected by the data so queried. Consumers also benefit from the ability to query data models concerning retail products and services, financial data, and other tabular or personal information. The prior art emphasizes the acquisition of a vocabulary of nouns and verbs and how they map to entities and relations in a relational data model. Research efforts and commercial products of the prior art have attempted a variety of approaches to augmenting data models with lexical and syntactic information in an effort to support natural language queries. The prior art does not disclose the translation of knowledge expressed in natural language into operational logic within programs. Moreover, the systems of the prior art are strictly syntactic with very little ability to understand the semantics of natural language. That is, the prior art treats natural language syntactically, not grammatically.
The limits of the prior art in natural language interfaces are demonstrated in products such as Linguistic Technology's English Wizard and Microsoft's English Query. These products have the distinct disadvantage that they do not address the issue of managing knowledge about an underlying object or data model. That is, the systems of the prior art demonstrate only a vocabulary that refers to parts of a model. The systems of the prior art have the distinct disadvantage that they do not have the ability to represent knowledge about a model (e.g., sentences constructed from such vocabularies) or to define knowledge prior to the implementation of a model.
The systems of the prior art have the additional disadvantage that they aim only to provide end users with access to data rather than to manage the definition of a model throughout its life cycle within and across an enterprise. Consequently, if the vocabularies and mappings of these systems are shared, untrained users may incorrectly define or alter mappings from the vocabulary into the model. Any such change, whether or not correct and appropriate, becomes permanent and destructive, without substantial support for version control, user privileges or object permissions.
The systems of the prior art have the additional disadvantage that they parse input from users according to syntactic rules and limited vocabularies. Because users cannot reasonably be fully aware of the restrictions imposed on their grammar by such rules, they are often frustrated when their input is rejected as non-grammatical. In addition, because the syntactic rules and word senses of the vocabulary are typically ambiguous, users are often frustrated when their grammatical input is misinterpreted.
Knowledge Management Systems. The practice of knowledge management involves capturing the information that business personnel need to know and use in the course of doing business. Existing knowledge management systems maintain such information as unstructured text and are primarily concerned with storing and providing access to documents comprised of at least paragraphs, but most typically many pages of content per document. Conventional knowledge management systems are almost never applied to managing documents comprised of at most one sentence.
The sentences within a knowledge management system include statements of fact as well as conditional imperatives. Collectively, such statements of fact and descriptions of behavior define the knowledge that people or computers must know and use in order to perform or support a business function or process. That is, the text stored in the database of a knowledge management system (i.e., a “knowledge base”) documents the policies and practices of a business. Such a knowledge base is often administered and shared among the employees may be repeatedly referenced by personnel and can be used by business analysts to produce systems requirements and functional specifications which are subsequently implemented by programmers.
Conventional knowledge management systems have the significant disadvantage that they make no effort to formally acquire, analyze, and understand the lexical, syntactic, and grammatical structure of sentences within the text they manage. Consequently, these knowledge management systems are incapable of reliably translating such sentences between natural languages or into computer software expressed in any programming language. The limits of the prior art in knowledge management are demonstrated in products from Verity, Fulcruin, and Documentum. These products have the distinct disadvantage that they do not parse, acquire, or validate a document at the level of sentences. Consequently, the prior art is incapable ensuring that each sentence within the knowledge base is semantically consistent and unambiguous. Therefore, the knowledge documented in the prior art is suitable only for use by people, not for direct translation into computer programs. Moreover, without a semantically consistent and unambiguous understanding of every sentence in a knowledge base, automatic translation between natural languages (e.g., English or Spanish to or from French or German) is unreliable. Consequently, knowledge managed using the prior art requires manual translation in order to be effective within multilingual (e.g., multinational) organizations.
Software Design Methodology. An application of knowledge management involves the collection of business policies and practices, sometimes referred to as requirements and/or specifications for software that is to be developed in support of, for example, business operations. Established software design methodologies have the distinct disadvantage that they distinguish between the requirements and specifications of the business and the software implementation of such requirements and specifications. Systems of the prior art, such as the knowledge management systems listed above and software modeling tools that support software design methodologies (e.g., UML, Universal Modeling Language), such as Rational Software's Rose and Microsoft's Visio, are distinct. Knowledge management systems manage documents and software-modeling tools are distinct. That is, the prior art provides no automatic integration between the business requirements and specifications managed within a knowledge management system and the implementation details managed within a software design tool. Most specifically, the statements made by (i.e., the sentences authored by) the business are not isomorphic to the statements made within most programming language. Consequently, a mapping from business requirements and specifications to source code or vice versa cannot be maintained.
Knowledge management systems of the prior art have no capability to generate software, as discussed above. In addition, software design tools of the prior art have limited software generation capabilities in that they do not incorporate the business policies that are to be reflected within the generated software. Such design tools are limited to generating models into which programmers manually implement code reflecting separately documented business policies. In addition, programmers must manually modify and maintain generated source code when business requirements or specifications change. This manual intervention and implementation results in inordinate delays and poor reliability. Consequently, established software design methodologies suffer from the disadvantages that they emphasize comprehensive yet detailed design before and long cycle times between each version of the resulting software.
Business Process Automation. The natural language query systems of the prior art attempt to perform actions on the state of a database as it exists when the query is specified. Business processes, on the other hand, are defined by policies or practices that are applied whenever they are relevant. Such business polices and practices are typically known as business rules. Established software development methodologies involve the gathering of business rules from operational, managerial, and executive business personnel by so-called business systems analysts. These analysts are the authors of the requirements and specifications documents discussed above. Programmers use the resulting documents to craft software that reflects the business rules documented by the analysts.
To the extent that the work product of programmers is distinct from the work product of analysts, business process automation in the systems of the prior art has the disadvantage of communications overhead and its attendant costs and risks of confusion or ambiguity. This disadvantage also applies to the extent that the work product of the analysts is distinct from the statements or perspective of the operational, managerial, or executive business personnel from whom analysts gather business rules.
Software Development Process. As described above concerning software design methodology, the statements of traditional (i.e., procedural, including object-oriented) programming languages have the distinct disadvantage that they are not isomorphic to the requirements or specifications stated by operational, managerial, or executive business personnel. As discussed above, this disadvantage in design also manifests itself during the development process in that changes in the requirements or specifications cannot be incrementally reflected in source code. That is, changing or introducing a business policy or practice may affect or introduce many programming statements.
The difficulty of producing and maintaining programming statements that remain consistent with business statements concerning policies and practices can be avoided if the business statements are expressed and implemented as business rules. Expressing business policies and practices as independent statements in a rule-based language can maintain the isomorphism between business statements and programming statements. However, in order for the isomorphism to remain between business and programming statements, the business statements must be specified carefully enough that they become directly executable or so that the code which implements those business statements can be automatically generated and, thereafter, executed. In either case, the prior art continues to suffer from the distinct disadvantage that business personnel cannot directly specify statements with the formality required by rule-based programming languages and their engines or code generators.
Production Rule Systems. As described above, in order for business personnel to state business policy, practice, or process specifications such that they remain isomorphic with their implementation expressed as programming statements, it is necessary that those business statements are formally encoded in an unambiguous grammar which is either directly executable or from which executable programming statements can be automatically generated. In addition, the expression of such business statements must not be with regard to any sequence or procedure. That is, business personnel specify how business operates not by being programmers themselves but by dictating how a business is to handle or respond to situations whenever and as they arise. Such specifications may be regulations affecting or policies that state business processes, for example. Each such statement is a rule. If such statements are independent then the collection of such rules is known within artificial intelligence as a production system where each such rule is more precisely a production.
The prior art implements business rules as production rules using either a rule engine or triggers. Triggers may be implemented within object-oriented programming languages such as C++ or Java or using SQL or scripting languages provided by databases such as IBM's DBS, Oracle, or Microsoft's SQL Server. The disadvantage of the prior art concerning triggers is that the resulting implementation is less efficient and scalable that using a rule engine. In addition, the programming code necessary to codify the checking and application of rules and their triggers must be specified by programmers rather than being automatically derived from business requirements and specifications expressed as sentences within a knowledge management system.
The Rete Algorithm. The Rete Algorithm is recognized as the most efficient algorithm for the implementation of the aforementioned production systems. One alternative, the Treat Algorithm, offers competitive performance in limited cases. However, Rete's performance becomes increasingly dominant as the number of rules increases. One of the significant advantages of the Rete Algorithm is that it is the only published algorithm that checks the conditions of a set of rules within an expected period of time that is asymptotically independent of the number of rules. Thus, only the Rete Algorithm scales to thousands of rules. The principal reference for the Rete Algorithm is “Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem”, Artificial Intelligence, 19, pp 17-37, 1982, hereby incorporated by reference.
The prior art does not relate the rules implemented using the Rete Algorithm (nor rules implemented as triggers, as discussed above) to the sentences managed within a knowledge management system. Moreover, the Rete Algorithm has no intrinsic support for organizing the application of rules within a decision making process nor for coping with logical inconsistencies between statements authored by one or more users of a knowledge management system. Consequently, the prior art is incapable of supporting the resolution of inconsistencies or inadequacies in the collective sentences of a knowledge management system or of auditing the applicability of individual sentences in a knowledge management system. Thus, the prior art does not facilitate the testing, monitoring, or improvement of the knowledge managed.
Source Code/Version Control. The software requirements and specifications documents for software systems that are typically produced by business systems analysts can be subject to version control within a knowledge management system in much the same way that the resulting source code may be managed using a source code version control system. Computer files of any type, including documents as in a knowledge management system, but most typically files of software source code expressed in computer programming languages are commonly managed by version control systems. Products such as Merant's PVCS or Microsoft's Source Safe are typical of the prior art. The prior art typically manages versions of content at the level of document files (including source code files). However, the granularity of version control in the prior art is too coarse for a knowledge management system that manages a vocabulary and sentences expressed using that vocabulary. Consequently, the prior art is unable to manage knowledge that is accumulated incrementally by acquiring and maintaining dictionary definitions of words and sentences that use previously acquired vocabulary with subsequently modifiable dictionary definitions.
The set of statements in a knowledge management system that documents business processes evolves over time. Such statements are formulated and come into effect incrementally and may evolve through multiple versions before expiring. Statements are formulated by an author and may be refined in subsequent versions by various authors who are permitted to affect such statements or who have the privileges need to grant themselves such permissions. The prior art has no effective ability to manage versions of statements at such a level of granularity, particularly where certain words in the vocabulary used within such statements may be restricted to certain authors or groups of authors and where words in the vocabulary are related to implementation details maintained as software model information within the knowledge management system where such software model information is itself subject to version control.
Because adding, removing, or changing a statement usually has some actual impact on a business, the ability to modify the repository of statements affecting business should be administered. Because business people have various responsibilities and capabilities, operations on the repository of statements should be controlled by the administration of privileges, which may be assigned to users or groups of users. Because individual statements may have varying degrees of maturity or certification, operations on statements should be controlled by administration of permissions that may be granted to users or groups of users for a statement or a set of related statements. However, all the forgoing is beyond the current state of the art in knowledge management and source code control systems.
Speech Recognition. Speech recognition systems recognize either continuous speech where words are expressed naturally without intervening pauses or as isolated words. Isolated word recognition is awkward other than for very limited purposes and is becoming less relevant as the quality of continuous speech recognition systems increases. The prior art in continuous speech recognition provides accurate recognition by balancing restrictions on grammar with restrictions on vocabulary, one of which must be fairly constraining in order for speech recognition performance to be acceptable. Grammars in the prior art are either probabilistic word sequence models or context free syntactic specifications. Probabilistic word sequence models do not ensure syntactically correct recognition, however, and neither approach can ensure that what is recognized is semantically clear and unambiguous. Consequently, the prior art is incapable of the natural language processing required in order to capture grammatically correct and unambiguous knowledge.