This specification relates to data processing techniques such as data mining.
Data mining is used, for example, to identify attributes of a dataset that are indicative of a particular result and to predict future results based on the identified attributes. As the number of records in a dataset increase, combinations of attributes and attribute values may be used to predict future results. Therefore, the combinations of attributes and attribute values that are indicative of future results can become more complex, such that machine learning techniques may be used to identify combinations of attributes and attribute values that facilitate computation of predicted results.
A decision tree is an example of a tool that is used to help identify attributes and attribute values that can be used to facilitate computation of predicted results. Decision trees group a dataset into subsets of records based on the attributes and corresponding attribute values of the dataset. The full dataset and each of the subsets of records are represented by nodes. The nodes representing the dataset and each of the subsets of records can be connected by links that are referred to as branches. The nodes are connected in a hierarchical manner such that a predicted result for input data is computed based on sets of rules (i.e., split-points) that define the sets of records that are represented by the nodes. In turn, a node that represents records that have attribute values similar to the input data can be identified from the decision tree and the result corresponding to the identified node can be defined as the predicted result for the input data.
Decision tree models are used for data mining and decision support systems to inform decisions based on expected results. The decision tree models are trained by mapping attribute values of a dataset to results corresponding to the dataset. This mapping can be used to identify expected values for an input set of attribute values. For example, businesses can use decision trees to compute an expected result (e.g., profit, sales, or customer satisfaction) for each option based on an input of attribute values corresponding to each option. In turn, the business can evaluate each available option in a set of options based on the expected result for each option and select the option that is expected to best achieve desired results. However, in some situations, businesses may require expected results having a higher probability of being correct than those being provided by the decision tree model as initially trained.