In tasks that classify instances with few known features, it is usually difficult to build useful classifiers without adequate information because the results are inaccurate or unreliable. To improve the accuracy, the classifier needs to gather more information before classification. As an example, in a technical support center, a customer calls the technical support center with a technical issue. At the beginning, the customer (the information source) may only provide limited information (feature values) to a representative at the center. To identify the issue (classification), the representative asks questions (unknown features), and the customer provides some answers (values of probed features) or maybe volunteers some additional information (values of non-probed features). After a few rounds, the representative may identify (classify) the problem correctly and provide a suitable solution. In such tasks, additional to the accuracy of identifying the problem, the efficiency or the number of probing is also an importance criterion to evaluate the performance.
A pre-built decision tree may be considered as a straightforward approach to such a task. At each non-leaf node, the tree classifier probes the feature values. With the supplied feature value, the classifier follows the branch. Repeating the process, the classifier reaches a leaf node and makes the prediction with adequate information. However, ignoring the given feature values at the beginning and the volunteered feature values during the process makes this approach inefficient. Moreover, the data source may not be able to provide all feature values for some instances, which requires the tree to have an “unknown” branch for each split. Instead of using static pre-built decision trees, the system dynamically creates a split based on the given information.
To dynamically probe feature values, the system needs to estimate which features are the most informative to make the decision. To estimate the information given by an unknown feature, the system needs to classify the instance given the known feature values and the unknown feature under estimation. On one hand, building classifiers for all possible feature subsets is impractical, because the number of possible combinations of features is exponentially large when the number of features is large. On the other hand, building classifiers on-the-fly is also impractical because of the cost of building classifiers.