Currently, the methods available to fit multivariate data to an equation are limited in scope. The state of the current art includes software packages that allow a user to fit one independent variable to a variety of functions. Unfortunately, many types of data have more than one independent variable. Also, such analyses require the user to test each function by trial and error, and the software make no decisions. Other software packages currently available take the approach of fitting one or two independent variables to complex equations by testing the data with thousands of pre-defined formulas. This type of “blind” analysis can result in the creation of models that may include extraneous terms (e.g., multiple terms for only equations only requiring one or two independent variables). Such techniques are not efficient for analyzing data sets requiring large number of independent variables since the number and size of equations to evaluate grows exponentially. Nor do the available packages identify which variable is most important.
Also, many data sets contain missing data due to sampling problems, or in the case of surveys, deliberate omission. One way to handle missing data is to delete an entire record if a value for a single independent variable is missing. This is often not optimal as information corresponding to data that is available for other independent variables is lost.
Thus, there is a need for a computer implemented statistical modeling program that is flexible enough to analyze data sets comprising a plurality of independent variables, but which provides a meaningful mathematical description of the data set. For example, it would be desirable to have the statistical modeling analysis describe the data using a minimum number of terms, so that the significance of each independent variable can be evaluated in a meaningful manner. There is also a need for software that can automatically approximate values for missing data. It would also be beneficial to have a statistical modeling method that provides a series of increasingly complex equations, so that a user can apply the data set to real world problems, and evaluate the models provided by the analysis in light of known physical parameters.