In many mature learning applications, training algorithms are advanced and well-tuned, leaving the design of new, informative features (e.g., data attributes) as a driver of error reduction. Thus, predictive accuracy of machine learning systems can be improved by adding a potential feature that is informative to the prediction task. Pathways for designing new features vary widely, ranging from constructing functions combining existing features to adding features obtained from a previously unused data source. Conventionally, a new potential feature is typically evaluated by its augmentation to a presently used data representation, and re-running the training and validation procedures to observe a resulting difference in predictive accuracy. However, this complete retraining is oftentimes impractical, especially in large scale learning scenarios.
Traditional approaches for computing accuracy improvement obtained from adding a potential feature, where such approaches re-run the learning algorithm on labeled training data augmented by the potential feature can be computationally, logistically, and monetarily costly. Such costs associated with traditional approaches can hinder rapid experimentation in design and evaluation of potential features.
For example, many domains, such as web search and advertising, utilize sophisticated, computationally expensive learning algorithms and very large labeled datasets, imposing experimentation latency that is a barrier to rapid feature design. Thus, the traditional approaches that re-run the learning algorithm on the labeled data augmented by the potential feature can be computationally costly and time consuming. According to another example, industrial implementations of learning algorithms are typically components within large infrastructure pipelines, which can require significant domain expertise to run. Following this example, potential feature contributors lacking such expertise can be deterred from evaluating their features (e.g., features developed for a different application in the same organization) due to the complexity of adding the potential feature to the training pipeline (e.g., due to logistical costs). Pursuant to yet another example, in some domains, such as medical or marketing applications, potential feature values may be unavailable for the complete training set or may carry non-negligible costs, encouraging evaluation of feature relevance on a data subset before committing to obtaining values of the potential feature for all data (e.g., due to monetary costs).