The invention is generally related to computers and computer software. In particular, the invention is related to case-based reasoning and decision tree data structures for use therewith.
Case-based reasoning is but one of a number of types of computer analysis approaches for drawing conclusions from input data. Case-based reasoning typically uses a decision tree to xe2x80x9cprunexe2x80x9d a library of past cases, also referred to herein as a xe2x80x9csearch space.xe2x80x9d A decision tree is created by inductive reasoning,which draws generalizations from past data and applies those generalizations to new data to draw specific conclusions about the new data. Inductive reasoning is the complement of deductive reasoning, where responses to input data are developed from known general principles.
Case-based reasoning typically relies upon nearest-neighbor matching to attempt to predict a result for an unknown case based upon the results of past cases stored in a search space or library. As an example, case-based reasoning may be used by a bank to predict the likelihood that a particular customer would default on a loan, and thus whether a loan should be approved. Cases within a search space might include information such as the anticipated monthly payment, the length of time that a customer was employed at a certain job, the customer""s monthly income, etc. Also, for each case in the library, an indication of whether that customer eventually defaulted on his or her loan would also be provided for each case. Then, whenever a new customer was presented to the bank, information about that customer could be presented as an unknown case, with nearest-neighbor matching used to locate those cases in the library that most closely resembled the data associated with the new case. Then, based upon whether those nearest-neighbor cases resulted in defaults, a determination could be made as to whether a loan should be approved for the new customer.
One difficulty associated with nearest-neighbor matching in case-based reasoning is the fact that nearest-neighbor matching can be extremely computationally intensive, particularly when a large number of cases exist in a library and a large number of characteristics, or attributes, need to be analyzed for each case. For this reason, often a logical construct known as a decision tree is utilized to narrow the search space with which nearest-neighbor matching is performed during case-based analysis of an unknown case. A decision tree is typically stored in a decision tree data structure, and is essentially used to prune a search space into a smaller subset of cases most likely to be relevant to an unknown case.
A conventional decision tree typically includes a collection of decision nodes arranged into a tree data structure, thus defining a plurality of paths that each identify different subsets of the cases from a search space. At each decision node, a test question is provided that queries a particular attribute of an unknown case and selects one of a plurality of test answers based upon the result of the query. Associated with each test answer is either a reference to another xe2x80x9cchildxe2x80x9d decision node, from which another relevant query is performed, or a xe2x80x9cleafxe2x80x9d node, which identifies a subset of cases from the search space, and which represents the end, or termination point, for a particular path in the decision tree. As such, a unique path is defined in the decision tree for each unique combination of test answers to the test questions presented in the decision tree, such that a relevant subset of cases may be identified for each combination of test answers.
By xe2x80x9cpruningxe2x80x9d the search space in this manner with a decision tree, the most likely subset of cases in the search space are quickly identified, so nearest-neighbor matching can then be performed on a smaller number of cases. As a result, case-based analysis may be performed significantly more quickly and with generally comparable results to those generated without the use of a decision tree.
The accuracy of a case-based reasoning system that incorporates a decision tree, however, can be significantly impacted by the manner in which a decision tree partitions a search space. As a result, a significant amount of effort has been directed to the automated generation of decision trees and the arrangement of decision nodes and test queries therein to maximize the accuracy of a decision tree.
One problem associated with the use of decision trees, in particular, stems from the relatively dynamic nature of case-based reasoning analysis. In particular, a case-based reasoning system is only as good as the data provided to the system, and it is therefore desirable to update a case library relatively frequently to build a comprehensive and current library with which nearest-neighbor matching may be performed. However, given that conventional decision trees store specific identifiers to the cases that match each path in the decision tree, anytime a new case is added to a case library or search space, the decision tree used to access that library will typically need to be regenerated. Generating a decision tree is computationally expensive, however, and as such, whenever a case library is updated, case matching cannot proceed until the decision tree is modified in view of the cases in the updated case library. As a consequence, for frequently updated libraries, system availability may be adversely impacted by the need to frequently regenerate the decision trees associated with such libraries.
Therefore, a significant need exists in the art for a manner of increasing the availability of a case-based analysis system, and in particular, for a manner of reducing the need to update decision trees utilized in such systems.
The invention addresses these and other problems associated with the prior art by providing an apparatus, computer-readable medium and method for use in association with case-based reasoning and the like that utilize a novel decision tree data structure. The data structure incorporates a search criterion in association with each test answer to a test criterion defined within a decision node, for use in selecting cases from a search space that match the associated test answer to the test criterion. Rather than storing identifiers to the actual cases in a case library, or search space, within a decision tree data structure, search criteria are used to provide the mechanism by which those cases that represent the nearest-neighbors for each path of the decision tree data structure can by dynamically selected.
Among other benefits, associating search criteria with test answers within a decision tree data structure takes advantage of the fact that the partitioning of a search space on a relatively coarse level, as is done with a decision tree data structure, typically does not require complete synchronization and currency with respect to a search space. As such, the utilization of search criteria in lieu of actual case identifiers eliminates the need to regenerate a decision tree after each modification (e.g., the addition of a new case) to the search space. While it still may be desirable in some embodiments to regenerate a decision tree data structure from time to time, the need to do so is significantly reduced, thereby increasing the availability of a case-based reasoning system for analyzing unknown cases.
Consistent with one aspect of the invention, a method is provided for applying case-based reasoning on an unknown case. The method includes traversing a path among a plurality of paths defined in a decision tree data structure to identify a subset of cases from a search space suitable for performing nearest-neighbor matching on the unknown case. Each path includes a plurality of decision nodes, and each decision node includes a test criterion defining a plurality of test answers. Each test answer has associated therewith a search criterion that selects cases in the search space that match the associated test answer. In addition, traversing the path includes, at each decision node in the path, selecting a test answer among the plurality of test answers defined by the test criterion for such decision node based upon an attribute associated with the unknown case, and applying the search criterion associated with the selected test answer to the search space to select cases in the search space that match the selected test answer. The method also includes performing nearest-neighbor matching on the identified subset of cases.
Consistent with another aspect of the invention, a method is provided for accessing a search space that includes a plurality of cases. The method includes analyzing a test criterion resident in a decision tree data structure to select a test answer from a plurality of test answers associated with the test criterion, retrieving a search criterion associated with the selected test answer, and applying the retrieved search criterion to the search space to select cases from the search space that match the selected test answer.
Consistent with an additional aspect of the invention a method is provided for generating a decision tree data structure for use in accessing a plurality of cases in a search space. The method includes generating a plurality of decision nodes, each decision node including a test criterion that defines a plurality of test answers, and associating a search criterion with each test answer defined by each test criterion, wherein each search criterion is configured to select cases from the search space that match the associated test answer.
Consistent with a further aspect of the invention, a computer-readable medium is provided including a decision tree data structure for use in accessing a plurality of cases in a search space. The decision tree data structure includes a test criterion configured to test an attribute associated with the cases, the test criterion defining a plurality of test answers, and a plurality of search criteria, each associated with a test answer from the plurality of test answers, and each configured to select cases from the search space that match the associated test answer.
Consistent with an additional aspect of the invention, an apparatus is provided, including a memory and a decision tree data structure resident therein for use in accessing a plurality of cases in a search space. The decision tree data structure includes a test criterion configured to test an attribute associated with at least a portion of the plurality of cases, the test criterion defining a plurality of test answers, and a plurality of search criteria, each associated with a test answer from the plurality of test answers, and each configured to select cases from the search space that match the associated test answer.
Consistent with yet another aspect of the invention, an apparatus is provided, including a memory and a decision tree data structure resident therein for use in identifying a subset of cases from a search space suitable for performing nearest-neighbor matching on an unknown case. The decision tree data structure includes a plurality of decision nodes defining a plurality of paths in the decision tree data structure, each decision node including a test criterion defining a plurality of test answers, and each test answer having associated therewith a search criterion that selects cases in the search space that match the associated test answer.