Case-Based Reasoning (CBR) is a methodology for problem solving by reusing problem solutions that worked in the past. At the core of a CBR system there is a collection of previously solved problems called cases. Each case typically consists of a problem description and a verified solution to that problem. Given a new problem to be solved, a CBR system uses the most similar previously solved cases to derive a solution that applies to that new problem (see e.g., Aamodt & Plaza, Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches, 1994; also see U.S. Pat. No. 5,581,664 regarding a particular type of a Case-Based Reasoning system).
One of the main advantages claimed by CBR is its ability to learn how to solve new problems based on how similar problems were solved in the past. CBR claims to reduce the knowledge acquisition bottleneck compared to, for example, rule-based systems. Most of the knowledge acquisition effort for CBR consists of acquiring new cases. Because new cases can generally be acquired independently from one another this allows CBR to scale better compared to other problem solving methodologies.
Case Representation in CBR Systems
Cases in a CBR system may be represented and structured in diverse ways. A variety of knowledge representation formalisms have been used to represent case content in a way that lends itself to reasoning purposes. Three most common types of case representation are: feature vector cases, structured cases, and textual cases (see e.g., Bergmann et al, Representation in case-based reasoning, 2005).
Feature vector approaches represent cases as vectors of attribute-value pairs. Vector attributes usually have value types assigned to them but there are no explicit relationships between the individual attributes (though there may be some implicit relationships present, e.g., hard-coded in the logic of a CBR system). Cases represented as feature vectors are usually easy to acquire. Generally, it is trivial to design a user interface (UI) for acquiring such cases as the UI may be no more complicated than a simple data-entry form. Moreover, acquisition of such cases can frequently be successfully automated (e.g., Yang et al., Automated Case Base Creation and Management, 2003), so that little human intervention is necessary.
Structured cases are capable of representing relationships. The typical relationships are the binary is-a, has-a, and part-of relationships, but available formalisms can generally express arbitrary n-ary relationships. The two most common types of formalisms for representing structured cases are: frame-based formalisms, originating in Artificial Intelligence (AI), and object-oriented formalisms, originating in Software Engineering (e.g., Michael Manago et al., CASUEL: A Common Case Representation Language, 1994; U.S. Pat. No. 6,081,798). Recently, description-logics and RDF-derived formalisms are becoming more common. The process of case acquisition for structured cases can be quite expensive as construction of structured cases may require a lot of work, the effort increasing disproportionately as the size of individual cases grows and the number of relationships increases.
Text-based formalisms represent each case as one or more text fields. The benefit of this approach is that case acquisition can be relatively inexpensive—cases can readily be constructed if existing solved problem descriptions are in form of documents, notes, reports, etc. The simplicity of the case representation is offset by the need for more complex case retrieval algorithms.
It is useful to think about details of a particular case representation as being formalized by some model. (A model may be used in a CBR system for more than just formalizing the case representation, e.g., it can be used in case retrieval or in case adaptation. A CBR system designer may also choose to represent in a model other, non-problem solving, aspects of a CBR system, e.g., user skills, dialog constraints, etc. In the following, we focus only on the case-related aspects of the model.)
For feature vector representations, the model formalizing case structure can be very trivial—it may comprise no more than value types of the features (attributes) and possibly their range restrictions.
Structured case representations, on the other hand, may potentially have very complex underlying models. An example of a complex model would be an ontology of domain concepts and their relationships with additional constraints specified using some logic formalism (e.g., Bergmann & Schaaf, Structural Case-Based Reasoning and Ontology-Based Knowledge Management: A Perfect Match?, 2003). Cases are then instantiations of concepts from such a model.
Text-based case representations generally have little if any need for modeling a case itself—a model might contain simply the names and descriptions of the few text fields in the case. The complexity of such a system will be in the case-retrieval process which frequently benefits from using a (domain) language model.
As we have seen, models can vary in complexity. One aspect of that complexity is how many relationships between entities in the model are made explicit. With regard to this, models may be distinguished into shallow models vs. deep models. This is a gradual distinction—the deeper the model, the more relationships between the entities in the model are made explicit. The shallow vs. deep model distinction applies to all types of knowledge-based systems, not only to CBR.
Accordingly, feature vector and text-based case representations will have a shallow underlying model, while structured case representations may have a very deep underlying model.
Incremental vs. Standard CBR
CBR systems can be categorized into incremental CBR systems and standard CBR systems. In a standard CBR system, all information about the problem to be solved is collected up-front and this full problem description is presented to the CBR system, which then searches for a suitable solution, generally, without requesting any further information beyond what was already provided.
An incremental CBR system (e.g., Cunningham & Smyth, A Comparison of Model-Based and Incremental Case-Based Approaches to Electronic Fault Diagnosis, 1994; U.S. Pat. No. 5,717,835), on the other hand, begins with only limited information about the problem. As it matches that information against the collection of solved cases, it identifies new pieces of information that might be useful to know to help solve the problem. That information is then requested from the user, user's answer is stored in a temporary case (hence forth called a session case), and the next incremental CBR cycle commences. (Some incremental CBR systems, in addition to asking simple questions, may launch more complex information acquisition actions. These actions may combine several related questions, may contain a query or queries to an external system, etc.) Eventually, the incremental CBR system may obtain all information it needs to solve the problem.
The major benefit of an incremental CBR system is that it generally needs to obtain significantly fewer pieces of information to solve a problem. This benefit is of special importance in situations where information is obtained from a human user by asking questions, as opposed to getting the information more seamlessly, e.g., by reading sensor values; or if there is a high cost associated with acquiring each piece of information, e.g., if the user has to performs some lengthy test to determine the answer.
Incremental CBR works particularly well in domains where a problem and its context may potentially be described by a large number of variables, while in practice only a small subset of them may be relevant to solving a specific problem. An example of such a domain is troubleshooting.
Case Acquisition in CBR Systems
Case acquisition is the main mode of knowledge acquisition in the CBR systems. There are also other ways knowledge represented in a CBR system can be expanded, like acquisition of similarity knowledge, or expansion of the model. We are not addressing acquisition of these other types of knowledge and concentrate on just the case acquisition (assuming, for the most part, a fixed model).
As has already been mentioned, acquisition of new feature-vector or text-based cases is relatively easy. Same is also generally true for structured cases with a very shallow model. Most of the effort is in collecting information about solved problems and once that is available the new cases can be relatively easily entered into a CBR system. Sometimes, a specific case representation is chosen to facilitate case acquisition (e.g. U.S. Pat. No. 5,717,835).
Situation is more difficult for structured cases with complex (deep) underlying model. One way to build such cases is by manually instantiating and asserting every piece of information that represents the case structure using the same or similar tools as for building the model. A typical model editing tool will provide means for creating concept instances and editing them in addition to providing means for modeling concepts (classes). This can be done via a graphic UI, like e.g., in ReCall CBR software by ISoft. This, in general, is a very laborious process that, moreover, requires thorough understanding of the model and how it is used during the runtime, thus the expertise level required from whoever enters the cases this way is quite high.
For incremental CBR systems, an alternative way to acquire new cases is to try to rely on the functionality of the problem-solving CBR runtime system. Instead of using the CBR runtime system to solve a new unknown case, it is used to replay a case for which both the solution, as well as the problem-solving steps, are already known. At each problem-solving step, choices (regarding both the information acquisition actions to be launched and the answers provided) are made that match the choices made while the real-world problem case was being solved.
This case-driven method of case acquisition presupposes two things:                (1) First, that the incremental problem-solver UI allows for selecting from several possible information acquisition actions at each CBR step. Indeed, many incremental CBR tools enable this by presenting a ranked list of questions from which the user can choose one to answer.        (2) Second, because the suggestions for information acquisition actions are based on the best-matching cases, this method presupposes that the available collection of cases (the case library) is sufficiently large and/or varied so that it can be used to generate useful suggestions.        
Because of the latter, this method has obvious problems when one tries to acquire novel cases, i.e., cases which are not a close variation of the already stored cases. This makes this method especially unsuitable to bootstrapping the system, when every entered case is a novel case.
As an example that illustrates this deficiency of the purely case-driven acquisition, let us consider a CBR system for troubleshooting Internet connections. (The examples provided here depict troubleshooting scenarios that may be encountered when dealing with technical support issues related to Internet service or cable TV service. However, such scenarios are for illustration purposes only; the systems and methods described herein can be used in connection with other scenarios.)
Suppose that all known cases stored in the system that are relevant to the problem of not being able to receive e-mail contain only “Reset-the-Modem” action. Assume now a real-world problem case where given same or very similar symptoms an observation was first performed whether a particular web page was viewable. If we try to create a representation of this new case using the method described above we will never be given an option to choose “Check-WebSite-Viewable” action if none of the library cases has it. Instead we will be presented only with a choice of “Reset-the-Modem”, which is not the one that we are looking for, even though it might be defined in the underlying model.
Some incremental CBR tools try to combine the two ways of entering the cases, namely case-driven acquisition (using normal problem-solving mode) and manual case entry using a model editing tool. An example of this is the CBR tool used in the HOMER system (e.g., Goker et al., The development of HOMER: A case-based CAD/CAM help-desk support tool, 1998). It allows for case acquisition based on previous cases using its problem-solving mode, but expert users can override this by entering data directly on case instances. Thus, it avoids some limitations of the case-driven method by allowing an expert to directly enter data that might not be suggested by existing cases. It also avoids some limitations of the direct case-entry using a model editing tool, mainly by reducing the frequency of when it is needed. However, a significant limitation of the HOMER system still remains that unless the new cases being entered are like the existing cases, an expert user is required to enter the cases by directly editing their structure and content, which is expensive. This occurs frequently enough, e.g., when bootstrapping the system with cases or when expanding the system to cover new types of problems, to have a significant negative impact on the cost of using and maintaining a CBR system like this.
Thus, we see that with today's methods and systems acquiring new cases in an incremental CBR system that has complex-structured cases requires significant effort and know-how, especially in situations where the cases being entered are novel. In entering these cases one cannot rely on the “help” of already entered cases, and entering these manually using a model editing tool requires high expertise and is time consuming. This makes building and maintaining such systems both difficult and costly.
Given these rather significant limitations, better tools for acquiring new cases are needed:                (a) Tools that do not require entering cases at the model level, while still retaining the flexibility to enter novel cases, dissimilar to cases already stored in the system.        (b) Tools that allow for acquiring structured, deep-model cases by persons not necessarily having expert knowledge of the model used by a particular CBR system, without having to resort to manual manipulation of the case structure—the tool itself should be responsible for creating correct internal representations according to the model.        (c) Ideally, these tools should support easy case acquisition based on case descriptions that are in a form easily understood by persons authoring the cases.        (d) For many incremental CBR systems, these case descriptions might be in a form of a (transcript of a) dialog, i.e., a series of questions and answers, that illustrates how a particular problem was solved in practice. A case-entry tool should allow the user to enter the case in a manner most similar to the dialog.        (e) In order to avoid the bootstrapping problem, as well as to allow entering of significantly novel cases, the tool should not be exclusively case-driven. The tool should use a session case plus the model, rather than a session case plus only the already-known cases, to generate a list of applicable information acquisition actions from which the user could pick these that match the real-world problem case the best.        (f) The tool and the underlying method and system, should ideally impose no restrictions on the model used. (Some prior art achieves most of the goals stated above by imposing a particular type of model, see e.g. U.S. Pat. No. 5,717,835.)        
One particular application area for such a case-entry tool is constructing cases based on transcripts of phone calls to a customer-support center. An ideal tool would allow one to follow the call transcript and select, in the tool UI, questions and answers matching those occurring in the transcript.