Regression analysis models a relationship between a dependent variable and one or more independent variables. Regression analysis can determine how a typical dependent variable changes when any one of the independent variables is varied, while the other independent variables remain the same. Most commonly, regression analysis estimates a conditional expectation of the dependent variable given the independent variables.
Of particular interest to this invention is selecting features used in continuous-valued regression analysis. Procedures for regression analysis include neural networks, and support vector machines (SVM). Typical applications of regression analysis include time series prediction, e.g., the prediction of future values of an electrical power demand based on past values, and prediction of an unknown quantity of interest based on available measurements, e.g., the prediction of a person's lifespan based on measurements of height, weight, blood pressure, and hair length, for example.
Feature selection determines a subset of the available features used in regression analysis. In the above example of an application for predicting lifespan, the subset of useful features can include height, weight, and blood pressure, while hair length is not useful. In this application, the feature selection procedure should only select the subset of useful features, e.g. height, weight, and blood pressure, and the procedure should exclude the useless feature, e.g., hair length. By eliminating useless features, the feature selection can reduce the time for subsequent prediction. By eliminating useless features, feature selection can also improve the accuracy of subsequent predictions, and lead to models that are easier to interpret.
Many feature selection procedures use simple measures of linear dependence, such as correlation, to select useful features. Those approaches can fail when the relationships among the variables are nonlinear. Wrapper techniques greedily select a small number of features at a time by evaluating a specific, potentially nonlinear, regression analysis problem. Because wrapper techniques greedily select a small subset of the features, wrapper techniques cannot determine the best overall combination of features. Wrapper techniques are often computationally intensive, and because wrapper techniques directly incorporate a regression method as a subroutine, wrapper techniques are directly tied to that particular regression analysis method.
The well known RELIEF feature selection procedure avoids most of the undesirable properties of other feature selection methods, see generally U.S. Pat. No. 7,233,931 issued to Lee, et al. on Jun. 19, 2007, “Feature regulation for hierarchical decision learning,” incorporated herein by reference. That method is not greedy, not computationally intensive, and not tied to a specific regression analysis method. However, the RELIEF procedure works only for classification and categorical problems, i.e., problems in which the dependent variable can take a value from a small set of discrete values. An example of a categorical problem would be a disease detection problem, where the dependent variable can take one of two possible values indicating presence or absence of the disease. In contrast to categorical problems, continuous-valued problems have dependent variables that can take values from an infinite set of values, for example all real numbers. In this case, we refer to the values taken on by the dependent variable as “target values.”