In machine learning, users are often interested in understanding how to classify data given examples of previously classified data (a training set). Many algorithms have been devised to tackle this problem and each has its strengths and weaknesses. Nonetheless, many if not all machine learning systems process features rather than the raw data. For example, a piece of fruit can be classified as an orange or an apple. A salient feature for classification might be the color of the fruit.
Despite the importance of features in classification, most algorithms require that the user specify the features that should be utilized. While this may not appear to be an obstacle, people are typically bad at selecting important irredundant features from large complex data sets. Similarly, feature selection (removal of redundant features) is typically accomplished after feature generation, not in tandem with it. Furthermore, human selection is time consuming and very subjective—often performed by selecting features that a human may use to perform classification rather than what might be useful to the underlying classification algorithm.