Aspects of the disclosure relate to efficiently selecting explanatory or predictive features in a parallelized computing environment.
Feature selection is an effective technique for dimensionality reduction and relevance detection. It may be used to improve the accuracy, efficiency and interpretability of learning models. Feature selection has been used as a valuable component of successful data mining applications in a variety of fields, including text mining, image processing, and genetic analysis, to name a few.
Continual advances in computer-based technologies have enabled corporations and organizations to collect data at rapidly increasing pace. Accumulated business and scientific data from many fields, such as finance, genomics, and physics, are often measured in terabytes (10^12 bytes). The enormous proliferation of large-scale data sets brings new challenges to data mining techniques and may require novel approaches to address the big-data problem in feature selection.