The present invention pertains generally to field of artificial intelligence, and more specifically to a method and system for adaptively and dynamically converting qualitative human judgments into quantitative mental models.
Epistemology is the branch of philosophy that studies the nature of knowledge, its presuppositions and foundations, and its extent and validity. Within the epistemology community, there is no generally accepted explanation for how tacit, experiential, subconscious and qualitative knowledge, judgments, skills, gut-feelings, and intuition actually work. There is, however, a general consensus regarding the existence of two basic types of knowledge: explicit and tacit.
While some epistemologists may define the terms “tacit knowledge” and “intangible knowledge” differently, the two terms are used interchangeably herein. Tacit knowledge can be defined as the indescribable and intractable, yet commonplace knowledge that people acquire and possess by virtue of their experience. It is knowledge that is tempered by social and organizational norms, as well as personal predilections. Therefore, it is dynamic and evolves and refines over time. For example, a master craftsman is a superior craftsman due to decades of experience and his modus operandi is tempered by the norms of the environments to which he has been exposed, reflecting his personal artistic tastes and biases. His tacit, amorphous, or intangible knowledge is visible in his work. Furthermore, he is able to make a wide variety of decisions without being able to exactly explain the reasons for his acts, much less provide precise rules.
However, an expert, such as a master craftsman, has no mechanism for analyzing and interacting with his own tacit knowledge. There is no method for engaging in a hermeneutic dialog with his personal mental-model or intellectual text. Experts cannot ask their personal mental-model to find solutions for constrained, weighted goals. They have no techniques for reflecting upon their knowledge-selves from various perspectives and combinations of relative weights and constraints applied to their mental-models. Unfortunately, creating a means for interacting with a tacit knowledge mental-model is particularly difficult from a practicality standpoint, and elusive from a theoretical standpoint. In order to effectively interact with a tacit knowledge mental-model, one might convert tacit knowledge into explicit knowledge. However, from a theoretical standpoint, such a conversion is problematic. If tacit knowledge is indeed tacit then how can it be made explicit? Furthermore, if tacit knowledge can be made explicit, is it tacit knowledge?
Different experts might perform identical tasks differently. Sometimes, the same expert might perform the same task differently at different times. Attempting to map tacit knowledge in itself is an ambitious and ill-structured intractable problem. For instance, even if one were to ignore the recondite nature of tacit knowledge and computational limitations, an attempt to encode and execute such voluminous knowledge is likely to cause a combinatorial explosion. A combinatorial explosion can be illustrated by a chessboard having sixty-four squares wherein a single grain of rice is placed on the first square, two grains on the second square, four grains of rice on the third square, and wherein the amount of rice placed on each subsequent square is double the amount placed on the previous square.
Unfortunately, a decision-maker is not necessarily aware of how he arrives at a particular decision and of his own tacit knowledge. Humans have biases and differ in how they process information and make decisions. Intuitive decision-makers approach a problem with multiple methods, using trial and error to find a solution. It has been argued that choices are not made, but are continuously being modified to accommodate changing objectives, environments, value preferences and policy alternatives provided by the decision-maker. Knowledge is created by cycling through active experimentation and reflective observation.
There exists a need to capture an expert's tacit knowledge that exists as complex cause and effect relationships and past episodes. Most knowledge-based systems are concerned with the explicable knowledge. Furthermore, existing tools for capturing knowledge mainly comprise simple maps, lists, narrative, and causal diagrams. Experts, in various walks of life, make decisions based on judgment, gut-feel, and intuition, because they can draw upon years of hands-on learning and past experiences. Such experts are precious intellectual assets of an organization, and when they leave, the organization is drained of decades of experience. To date, there have been no computer-based methods to quantify and retain this type of judgmental knowledge. Existing techniques, such as rule-based expert systems, only retain brittle perceptions of the experts, not their expertise. Also, they are very expensive and require months or even years to build.
Typical Knowledge Process Management
While there is no satisfactory method to elicit tacit or intangible knowledge, there are methods for extracting explicit knowledge. Typically a knowledge engineer or a moderator interacts with an expert. The knowledge engineer engages in a question and answer session with the expert to elicit explicit knowledge. In artificial intelligence (“AI”), the knowledge engineer might use techniques that are suitable for a particular expert system or programming environment. In a social science environment, the moderator might use techniques such as analytical hierarchy process (“AHP”), cause-mapping, drawing on a surface, interviews, and narrative, etc.
As a result of these knowledge acquisition techniques, the knowledge engineer assembles a body of knowledge, which he structures into a formal document in a form suitable for encoding into a rule-based expert system or systems dynamics modeling package. The analyst takes the documents from the knowledge engineer and prepares system design documents in the form of data flow diagrams (“DFD”), flowcharts, entity relationship diagrams (“ERD”), etc. The system design documents are then given to a programming team, which codes the received information into a software program. If the system is declared acceptable, then potential users are trained on the software. In some instances, however, the expert is not even involved in the use of the software or does know how to use it. At this point, the knowledge acquisition from the expert typically ends. However, if the system is found to be unacceptable, the entire process must be repeated, from the quest and answer session with the expert.
This method for knowledge process management is problematic for a number of reasons. First, a disconnect exists between the expert and the software program, or system. This disconnect creates a situation where the knowledge engineer inadvertently adds his own bias. In addition, the analyst and programming team add or remove from the knowledge because of constraints such as system document design, requirements or limitations of the programming environment, or of the expert system shell. Also, because the system contains only explicable knowledge, the expert may not be able to explicate his know-how, and he may even be aware of the skills he has acquired. In addition, different knowledge workers may approach the same problem differently based on their personal sets of biases, skills, and experiences. Additionally, because of the crisp rules-based programming, the system does not allow interpolation and is impossible to test comprehensively by firing all of the rules due to combinatorial explosion. Furthermore, the system is fairly static in that once built, implementing changes is problematic. In other words, the system is inadequate because it is costly, it does not permit direct interaction with the expert, it does not contain tacit knowledge, it is strictly rule-based, and it is not adaptable.
Existing Techniques
When viewed individually from the perspective of constructing an environment to map cognition onto computation for creating an adaptive interactive environment that captures intangible knowledge, existing techniques have many shortcomings. The primary reason is that these tools assume a rational decision-making homoeconomicus and go to extra lengths to remove human judgment from knowledge elicitation and model building processes. In doing so, they remove the very ingredient necessary for tacit knowledge.
On line Analytical Processing (“OLAP”) based decision-support systems are popular for storage, manipulation, slicing and dicing, and presentation of data. These system include variations of OLAP, such as relational OLAP (“ROLAP”), multidimensional OLAP (“MOLAP”), hybrid OLAP (“HOLAP”), etc. and can be used for representing and manipulating axiomatic human knowledge in order to present the facts from multiple perspectives. OLAP systems are problematic for a variety of reasons. First, dimensions are pre-assigned by the architect, which implies inflexibility in scaling the views. Also, in order to provide a reasonable response time, less than 5 seconds, data has to be precomputed for performance purposes. OLAP also suffers from the curse of dimensionality. Published benchmarks of OLAP products show a data explosion factor of 240, requiring 2.4 GB of storage to manage 10 MB of input data. Finally, the heuristics or AI used in OLAP are based on mathematical models which may or may not correspond to actual usage patterns.
Probablistic systems both conventional and Bayesian, are impractical for dynamic environments first because the sum of all probabilities must equal one, and second because a priori probabilities must be known in order for the system to work. In order to counter this problem, some systems use frequency-based probabilities. Such an approach negates the essence of probability theory and renders it difficult to ascribe any reasonable degree of certainty to the outcomes.
Bayesian Belief Networks are inspired by the work of Reverend Thomas Bayes' philosophy and integrate a frequency-based approach, based on probabilistic statistical theory, with a tree-like structure. Bayesian Belief Networks share many limitations and problems with Bayesian probability based systems in a dynamic environment. The tree-like architecture and references to frequencies of occurrence of events make it impractical for use as a general-purpose solution.
Uncertainty theories are generally ad-hoc theories with ad-hoc solutions. One example is the well-respected Dempster-Schaffer Theory of Uncertainty. This theory, like many others, basically relaxes the tenants of probability theory. In addition, it combines it with set theory to incorporate and account for uncertainty. The result is an extremely complex system that requires a full-time mathematician or an AI expert for maintenance. These systems are not adaptive because of their reliance on up-front custom designs and become prohibitive because of their high design and maintenance costs.
Statistics is one of the most important sciences of modern times. Although we make many decisions based on statistics, it is problematic when used for knowledge management. First, statistics require Design of Experiments (“DOE”), a carefully controlled set of results by running an experiment in a controlled environment at specific data points in a problem space. This approach is not possible in real-life dynamic situations. If the DOE is violated then the confidence in statistical analysis cannot be held. Secondly, when modeling a process, statistical analyses hold true under strict conditions. For example, it is generally assumed that relationships between objects are linear. This assumption is far from true in the highly nonlinear real world. Statistics can model moderately more complex, or nonlinear, relationships but at the cost of accuracy and certainty. Third, statistics are not adaptive in that they do not learn while maintaining existing knowledge. Fourth, statistical solutions are static. That is, if there is a change in any variables or if a variable is added or removed then the entire results must be recomputed. This makes statistical analyses inappropriate for dynamic environments.
Numerical vector analysis is based on a distance metric such as the Minkowski Metric. The basic concept is that of a metric to define distance or closeness that correspond to dissimilarities and similarities. Euclidean Geometry is a special case of this metric and is commonly used. The first step is to define a vector space and then to populate it with data. Each data point is represented as a vector in multi-dimensional space. The vectors, which lie close to each other, are considered to be similar with a certain degree of certainty. This is basic categorization of data. These methods work well with numeric data but are of very little help when dealing with text. The reason is that it is almost impossible to define a distance metric between words. For example, what is the distance between “cold” and “epistemology”? There are various quasi-distance metrics for text that mostly use set theory.
Rule-based expert systems gained popularity during the 1980's. These systems are composed of rules provided by an expert and a mechanism to invoke the rules using forward- or backward-chaining algorithms. Knowledge acquisition for is a bottleneck for such systems. It is time-intensive and iterative human-intensive activity requiring systems analysis, interviewing, and interpersonal skills. Experts may consciously or sub-consciously have ulterior motives not to be forthcoming with the entire or best information. Rule-based systems require domain experts, considerable time, and knowledgeable engineers who have familiarity with the domain. Experts may be too busy, difficult to deal with, or unwilling to part with knowledge. There is not enough understanding of how to represent commonsense knowledge. These systems are not adaptive—they do not learn from mistakes. Readjusting them can be a huge task. It is not easy to add meaningful certainty factors on many rules. There can be conflicting sources of expertise. These systems have low fault-tolerance and exhibit inelegant degradation because they deal with crisp rules.
Each of these factors considerably increases the cost of building and maintenance. These systems are not adaptive, that is they do not learn from previous mistakes. Furthermore, as the size of a problem increases the number of rules increases, resulting in high complexity, higher costs, and ultimately an unmanageable system. Even addition of one more rule can have unexpected effects. Therefore, these systems are useful when there are only a handful of rules, when the problem domain is well defined in advance, where the system is static, the knowledge engineers are unbiased and experts are willing to part with their knowledge. Even when experts are willing to part with their knowledge they will only be able to describe their explicit knowledge, not their tacit or implicit knowledge, gut-feel, know-how, experiential knowledge, and intuition.
Artificial Neural Networks (“ANN”) are mathematical representations of massive parallel processing systems loosely modeled after the biological brain. During the 1990s these systems saw a resurgence after almost three decades of quiescence. ANNs can learn and discover relationships, or mathematical mappings, between causes and effects from datasets. They can also be used for mapping, optimization, and auto- and hetero-associative memories. Common problems with ANNs are their inability to explain their behavior—black box nature, overfitting of data, introduction of noise, introduction of high degrees of nonlinearity, overtraining, memory effect, difficulties with generalization or regularization. Properly designing the architecture, preparing pristine data, training, and interpreting the results requires an understanding of mathamatics, probability, statistics, nonlinear multiple criteria optimization, and the problem domain. The biggest strength of neural nets—their ability to learn any relationship is perhaps also one of their biggest shortcoming, since they can learn noise and assume existence of improper relationships.
The present invention overcomes these problems by carefully segregating pockets of dense data points from the sparse dataset, by holding back data for regularization, by using optimization algorithms for locating the best optimum in synaptic weight space, and by parsimonious use of neurons.
Fuzzy Logic is an extension of classical logic. It is concerned with approximate instead of exact reasoning. It allows use of fractions instead of two discrete values. Fuzzy logic is an inexact reasoning technique, which uses the concept of degrees of membership in a set. This is an improvement because it allows the use of such concepts as hot, warm, lukewarm, . . . , cold, etc., whereas classical logic could only allow for two concepts such as hot and cold. Fuzzy systems are particularly successful in control applications in industrial processes and consumer goods as embedded components. An example of its success is the braking system of Bullet Trains. Fuzzy logic has only been successfully applied in systems with a small number of variables. Fuzzy logic requires a great deal of knowledge of the problem domain. The number of rules can grow exponentially with the number of variables and number of possible choices, rendering even a moderate problem incapable of implementation.
Success can be achieved by building and extensively fine-tuning each particular fuzzy logic system specific to a well-defined task. Fuzzy logic can provide model-free information. However, fuzzy logic has not yielded tools to easily convert qualitative information into a robust quantitative model. Fuzzy systems are not adaptive. Constructing and validating a rule base is an iterative and difficult task. Experts and time are needed to design, construct, validate and fine-tune fuzzy systems. This includes manually tuning membership functions and fuzzy rules. Fuzzy systems become very expensive as size and complexity of the problem increases. It is not easy to prove that during operation a fuzzy system will remain stable and not become chaotic because it can be very sensitive outside its operating range. There are also mathematical criticisms of fuzzy logic that under certain conditions fuzzy logic collapses to two-value logic—the paradoxical success of fuzzy logic. Fuzzy systems also lack audit trails.
Another major difficulty is in defuzzification. There are no consensus methods for automatically applying defuzzification. This means that each system has to be carefully handcrafted for a particular problem and it is neither scalable nor adaptive. This ultimately increases both the computational and design costs. The task of defuzzification is not only quite complex but also controversial. Generally, iterative trial and error and approximations are required to defuzzify results into a meaningful and acceptable form. Defuzzification alone makes the idea of fuzzy logic based tools unsuitable for building robust generalizable models.
Fuzzy logic has achieved modest success in model building in concert with other model building methods such as expert systems. But even in hybrid systems many of the above mentioned difficulties remain. Some of the more successful methods have combined neural nets with fuzzy logic to address some of these problems, e.g., Adaptive Network Fuzzy Inference System (“ANFIS”).
Judgmental Bootstrapping (“bootstrapping”) is a method used in forecasting to create quantitative models from human judgments. There are two types of judgmental bootstrapping: direct and indirect. The primary difference between bootstrapping and other expert systems is that for bootstrapping, experts are not asked to furnish rules; instead rules are “inferred” from their behavior. A quantitative model of an expert's rules is constructed by observing his judgments while he makes forecasts. Generally expensive protocol studies are required to gather this data. The causal variables used by the expert are treated as independent variables and his forecasts as dependent variables. From this observed data, ordinary least square regression is used to formulate linear quantitative relationships between the forecasts and the causal variables. The resulting mathematical equations or models represent the expert's rules.
Bootstrapping has some limitations. It requires the delicate task of observing experts during the performance of their tasks and translating the observations into numeric form. This raises fundamental socio-behavioral issues such as the Hawthorne effect, observational observations and dual interpretation process of distinguishing between “act meaning” (meaning of act to the actor) and “action meaning” (meaning of act as observer's subject matter). Bootstrapping models are not adaptive. These models, which are often supposed to aid experts, are not easily modifiable especially by the experts. Modifications may be necessary when either the expert's mental model or the external environment have changed. The model needs to be reformulated from scratch when many changes occur such as addition of new causal variables.
Extrapolation is also a concern. Reported studies have used cross-sectional data instead of time series data. The major problem, which is lack of adaptation remains even if experts are involved in validating and using their models. When modifying models, usually existing models have to be abandoned and new models need to be formulated from ground zero. Although much less data are required than, for example, conjoint analysis, at least five to ten experts are typically needed for useful bootstrapping models. In addition, since regression is used to build the model all the shortcomings of statistical regression also influence bootstrapping.
Analytic hierarchy process (“AHP”) is a “multicriteria decision model that uses hierarchic or network structure to represent a decision problem and then develop priorities based on decision-maker's judgments throughout the system. AHP has been applied with success in areas where judgment is to be taken into consideration such as facility location analysis, work scheduling, planning, capital budgeting and technology transfer. One advantage of AHP is its ability to measure consistency. The most important limitation of AHP is exhibited when the ranks determined for alternatives through the AHP change as new alternatives are added. Dyer proposed that this problem could be addressed if weights were expressed using an interval scale as opposed to the ratio scale. This requires borrowing concepts from multiattribute utility theory. Some have opposed this method of correction, arguing that AHP is a standalone body of research not an extension of multiattribute utility theory.
Some other technologies worth mentioning include Systems Dynamics Modeling, Simulated Annealing, and Markov and Hidden Markov Processes. None of these provide adequate means for mapping cognition onto computation and its inverse.