Data mining (DM), also termed knowledge discovery in database (KDD), is a hot issue of research in the current artificial intelligence and database field. The data mining refers to a process of discovering from huge data of database implicit, previously unknown and potentially valuable information. Generally, the data mining is the process of automatically searching in huge data for information having special relationships (belonging to association rule learning) and hidden therein. The data mining is usually associated with computer science and accomplishes the objects as described above by means of such as statistics, online analysis processing, information retrieval, machine learning, expert system (relying on past rules of thumb), and pattern recognition.
Various data mining platforms have been developed so far, by which predictive models can be created rapidly and applied to industries so as to help decision-makers to make correct decisions. Manifestation forms of predictive models may include rule sets, mathematical formulas, decision trees, etc, which may be used to generate prediction results according to a group of inputs or variables. After creating a predictive model, performance (precision) of the predictive model needs to be evaluated by using evaluation metrics, so as to ensure precision of the generated prediction results.
There exist various metrics for evaluating predictive models, such as receiver operating area under curve (AUC), accuracy, F-score, recall, precision, etc. However, data flow platforms that have been developed can only provide such an evaluation metric as accuracy. In some cases, for example with respect to unbalanced samples, the metric “accuracy” is not sufficient to reflect performance of the created predictive model. Therefore, it is a research focus in the current data mining field regarding how to determine from multiple evaluation metrics one or more suitable evaluation metrics for evaluating a predictive model.