This disclosure relates to machine learning.
In recent years, machine-learning approaches for data analysis have been widely explored for recognizing patterns which, in turn, allow extraction of significant information contained within large datasets. Learning algorithms include models that may be trained to generalize using data with known outcomes. Trained learning machine algorithms may then be applied to predict the outcome in cases of unknown outcome, i.e., to classify the data according to learned patterns.
In data mining problems, features (i.e., quantities that describe the data in a model), are typically selected from a pool of features. The choice of which features to use in the model can have a significant effect on the accuracy of the learned model. Peculiar problems arise when the number of features is large, e.g., thousands of genes in a microarray. For example, data-overfitting may occur if the number of training records, e.g., number of patients, is smaller compared to the number of genes. In some situations, the large number of features can also make the learning model expensive and labor intensive.