Using statistical models, various phenomena, such as a natural phenomenon or a social phenomenon, have been explained and predicted. An example of the statistical model is given by:
  {                                          Z            =                          α              +                                                β                  1                                ⁢                                  x                  1                                            +                                                β                  2                                ⁢                                  x                  2                                            +              …                                                            (            1            )                                                                                          F                ⁡                                  (                                      E                    ⁡                                          [                      Y                      ]                                                        )                                            =              Z                        ⁢                                                                                      (            2            )                                   where x1, x2, . . . represent variables called “explanatory variables”; β1, β2, . . . are coefficients respectively corresponding to explanatory variables x1, x2, . . . ; and α is a constant.
In equation (1), Z, defined by the sum of the constant α and a linear combination of explanatory variables and coefficients, is called a linear predictor; and Y is a variable called a response variable. As understood from equation (2), function F defines a relationship between linear predictor Z and expectation value E[Y] of the response variable Y. In this context, function F is not always given by a simple equation, and sometimes is expressed by a composite of plural functions or by a function to be solved numerically because it cannot be given in an analytic form.
For example, the weight is a response variable and the height and waist size can serve as explanatory variables.
One such statistical model is a generalized linear model. Examples of the generalized linear model include a linear regression model, a binomial logit model, and an ordered logit model.
The above statistical models have difficulty in selecting appropriate indicators as explanatory variables. As is known, this becomes an issue of concern in variable selection itself. The variable selection greatly affects the precision and usability of the statistical model.
So-called “brute-force regression” is one approach to select appropriate explanatory variables. With this approach, all possible sets of candidate explanatory variables are examined to find an optimum one. Here, p candidate explanatory variables will offer (2p−1) sets in total. Testing all possible sets, this approach can provide really the best set of variables but imposes a very large computational load. If the number of candidate variables p is large, the number of possible sets explosively increases, making the calculation virtually impractical.
Stepwise regression is another approach to the variable selection. With this approach, explanatory variables are sequentially added to or subtracted from a model based on some criterion such as an F value used in regression analysis, so as to find a more descriptive set of variables. This approach requires a relatively low computational load, and thus, can target many candidate variables. It, however, cannot always give an optimum set of explanatory variables.
In addition, Non-Patent Literature 1 discloses variable selection called “Lasso regression”. Non-Patent Literature 2 discloses variable selection called “elastic-net”. Either one uses a function given by adding a coefficient-dependent penalty term to a likelihood function, so as to select as explanatory variables the variable corresponding to each of the coefficients which has a non-zero value when the function becomes maximum. According to these, the selection of explanatory variables depends on a parameter called a hyperparameter, which regulates a penalty, but the parameter concerned can be selected freely. In addition, a set of selected explanatory variables generally is not meant to maximize the likelihood function itself.