Data mining is a data processing task which is based on a structured set of raw data. Typically the raw data includes a large set of records, each record having the same or a similar format. Each field in a record can take any of a number of logical, categorical, or numerical values.
U.S. Pat. No. 6,112,194 describes a method for data mining including a feedback mechanism for monitoring performance of mining tasks is known. A user selected mining technique type is received for the data mining operation. A quality measure type is identified for the user selected mining technique type. The user selected mining technique type for the data mining operation is processed and a quality indicator is measured using the quality measure type. The measured quality indication is displayed while processing the user selected mining technique type for the data mining operations.
A common disadvantage of current mining methods is the complexity of the operations the user has to perform. FIG. 1 shows a typical example of a prior art data mining method:
First, table 1 with training data is provided. Table 1 contains a number of records having data fields which are assigned to input field values such as “pain type”, “angina” . . . Column 2 of table 1 has been selected by a user. The column 2 is associated with the field value “diseased”.
Based on table 1 with column 2, a model 3 is formed by means of a data mining operation. In principle, any suitable current data mining method can be used such as linear regression, radial basis function and decision tree as well as neural network methods.
Model (or tree) 3 of FIG. 1 shows by way of example a decision tree model. The root of model 3 contains all the input records of table 1 and the leaves of model 3 represent the disjoint subsets which try to separate the records of table 1 according to the different field values occurring in column 2.
Model 3 is verified by means of test data which is contained in column 2 of table 4. By inputting the test data of table 4 and model 3 into the data mining application, column 5 is outputted containing data values that are predicted based on the input attributes contained in table 4 by means of model 3.
The predicted data in column 5 can be compared with the real data in column 2 in order to determine the quality of model 3. When the quality of model 3 is considered sufficient, application data provided in a table 6 and model 3 is inputted into the data mining application in order to predict the corresponding data values within column 5.
For performing the data mining task, the end user has to understand the different modes “training,” “test,” and “deployment” and needs to work with different types of data objects in the correct sequence. Furthermore, the end user needs to specify these objects correctly as input or output parameters:                the user needs to specify a table having a number of n+1 columns as input for the training mode;        a target column (column 2 of FIG. 1) needs to be specified for the purposes of prediction;        a model is outputted from the training mode;        a test data needs to be inputted into the test mode (table 4 with column 2 of FIG. 1);        the model needs to be specified as input into the test mode;        the output of the test mode is an n+2 column table comprising the additional column with the predicted data values (column 5 of FIG. 1);        the quality information which is outputted from the test mode needs to be evaluated;        the model needs to be specified as input into the deployment mode;        an n column table needs to be provided as an input for the deployment mode (application data); and        the result of the prediction is outputted in another column (column 5 of FIG. 1).        
The complexity of the resulting user interface limits applications of data mining. What is therefore needed is an improved method for data mining, and in particular an improved user interface for data mining that allows non-expert users to perform data mining tasks.