The present invention relates to the formation and the application of a knowledge base in general and in the area of data mining and automated decision making in particular.
Automatic decision-making is based on the application of a set of rules to score values of outcomes, which results from the application of a predictive quantitative model to new data.
The predictive quantitative model (sometimes referred to as an empirical model) is established by using a procedure called data mining.
Data mining describes a collection of techniques that aim to find useful but undiscovered patterns in collected data. The main goal of data mining is to create models for decision making that predict future behavior based on analysis of past activity.
Data mining extracts information from an existing data-base to reveal xe2x80x9chiddenxe2x80x9d patterns of relationship between objects in that data-base, which are neither known beforehand nor intuitively expected.
The term xe2x80x9cdata miningxe2x80x9d expresses the idea that the raw material is the xe2x80x9cmountainxe2x80x9d of data and the data mining algorithm is the excavator, shifting through the vast quantities of raw data looking for the valuable nuggets of information.
However, unless the output of the data mining system can be understood qualitatively, it won""t be of any use. I.e. a user needs to view the output of the data mining in a meaningful context to his goals, and to be able to disregard irrelevant patterns of the relations which were disclosed.
It is in this perception stage in which human reasoning, hereinafter referred to as xe2x80x9cexpert inputxe2x80x9d, is needed to assess the validity and evaluate the plausibility and relevancy of the correlations found in the automated data mining and it is that indispensable expert input that prevents an accomplishment of a completely automated decision making system.
Several attempts have been made to eliminate this aforesaid need for the expert input, mainly by automatic organization or a priori restricting the vast repertoire of relationship patterns which are expected to be dug out by the data mining algorithm.
U.S. Pat. No. 5,325,466 to Kornacker describes the partition of data-base of case records into a tree of conceptually meaningfull clusters wherein no prior domaim-dependent knowledge is required.
U.S. Pat. No. 5,787,425 to Bigus describes an object oriented data mining framework mechanism which allows the separation of the specific processing sequence and requirement of a specific data mining operation from the common attribute of all data mining operations.
U.S. Pat. No. 5,875,285 to Chang describes an object oriented expert system which is an integration of an object oriented data mining system with an object oriented decision making system and U.S. Pat. No. 6,073,138 to de l""Etraz, et al. discloses a computer program for providing relational patterns between entities.
Recently, dimension reduction was applied in order to reduce the vast quantity of relations of relations identified by data mining.
Dimension reduction selects relevant attributes in the dataset prior to performing data mining. This is important for the accuracy of further analysis as well as for performance. Because the redundant and irrelevant attributes could mislead the analysis, including all of the attributes in the data mining procedures not only increases the complexity of the analysis, but also degrades the accuracy of the result.
Dimension reduction improves the performance of data mining techniques by reducing dimensions so that data mining procedures process data with a reduced number of attributes. With dimension reduction, improvement by orders of magnitude is possible.
The conventional dimension reduction techniques are not easily applied to data mining applications directly (i.e., in a manner that enables automatic reduction) because they often require a priori domain knowledge and/or arcane analysis methodologies that are not well understood by end users. Typically, it is necessary to incur the expense of a domain expert with knowledge of the data in a database who determines which attributes are important for data mining. Some statistical analysis techniques, such as correlation tests, have been applied for dimension reduction. However, these are ad hoc and assume a priori knowledge of the dataset, which can not be assumed to always be available. Moreover, conventional dimension reduction techniques are not designed for processing the large datasets that data mining processes.
In order to overcome these drawback in conventional dimension reduction, U.S. Pat. No. 6,032,146 and U.S. Pat. No. 6,134,555 both to Chadra, et al. disclose an automatic dimension reduction technique applied to data mining in order to determine important and relevant attributes for data mining without the need for the expert input of a domain expert.
Being completely automatic, such a dimension reduced data mining procedure is a xe2x80x9cblack boxxe2x80x9d for most end users who rely implicitly and xe2x80x9cblindlyxe2x80x9d on its findings.
It is our opinion that defining relevancy between objects and events is still a human act which cannot be replaced by a computer at the present time. Further more, most end users of an automatic decision making system would like to be involved in this decision making process at the conceptual level. I.e. they would like to visualize the xe2x80x9cstate of affairsxe2x80x9d between factors which affect the final decision. They would even like to contribute to the algorithm of data mining by suggesting influential attributes and xe2x80x9ccause and effectxe2x80x9d relationships according to their own understanding.
Thus, we consider the expert(s) input to route and navigate the data mining according to a human knowledge and perception schemes as beneficial, provided it enables the processing of large datasets.
There is therefore a need in the art for an improved method and tool in data mining of large datasets which includes an a priori qualitative modeling of the system in hand and which will enable the automatic use of the quantitative relations disclosed by a dimension reduced data mining in automatic decision-making.
The present invention allows the automated coupling between the stages of data mining and score prediction in an automatic decision-making system.
The present invention discloses an innovative method referred to herein as Knowledge-Tree (KT), of conceptualizing any sequence of relations among objects, where those relations are not detectable by current methods of knowledge engineering and wherein such a conceptualization is used to reduce the dimension of data mining which is a requisite stage in automatic decision-making.
The KT enables automatic creation of meaningful connections and relations between objects, when only general knowledge exists about the involved objects.
The KT is especially beneficial when a large base of data exists where other tools fail to depict the correct relations between the participating objects.
In accordance to the present invention there is provided a method for automated decision-making by a computer comprising the steps of: (a) modeling of relations between plurality of objects, each object among the plurality of objects having at least one outcome and is subjected to at least one influential factor affecting the at least one outcome; (b) data mining in datasets associated with the modeled relations between the at least one outcome and the at least one influential factor of at least one object among the plurality of objects; (c) building a quantitative model to predict a score for the at least one outcome, and (d) making a decision according to the score of the at least one outcome of the at least one object.
In accordance to the present invention there is provided a knowledge engineering tool for describing relationship pattern between plurality of objects comprising a graphical symbolization of the objects and their assumed relations, the graphical symbolization including at least one interconnection cell which represents a component of a system whose the relationship pattern being described by the knowledge engineering tool.
In accordance to the present invention there is provided a computer usable medium having a computer readable program code, the program code uses a graphical representation of a Knowledge-Tree map to generate a knowledge base in a data storage region of a computer.
In accordance to the present invention there is provided an automatic decision-making system comprising: (a) a data mining tool to correlate between an outcome and a possible influential factor on the outcome; (b) a Knowledge-Tree based mechanism to reduce dimension of the data mining; (c) an empirical modeler to predict a score of the outcome and, (d) a decision making tool in accordance to the score.