As the world becomes more digitized and computers become more integrated into daily life titanic data growth is occurring. Data researchers have long struggled to keep pace with data analysis and to extract meaningful information from this huge volume of data. Conventionally some data analysis techniques include “regression” operations that are performed to identify variables (inputs) in the data that are related to the changes in control variables (outputs). More traditional regression techniques express these relationships as mathematical models, and provide analysis of the quality and generality of the constructed models.
Classical regression techniques can efficiently optimize the parameters in assumed models (e.g., by using ordinary or generalized least squares method for the given model structure). However, classical regression techniques require a priori domain knowledge and/or human intuition for identifying the appropriate input variables and the appropriate form of the functional relationship between inputs and the output in order to do so.
Symbolic regression is a promising technique for analyzing and identifying relationships in complex and/or expansive data sets. Typically data sets being analyzed can include physically observable numerical data or data derivable from real world phenomenon. Complex systems yield correspondingly complex and expansive data sets. Symbolic regression provides one avenue on which to attempt to tackle analysis of such data sets. Symbolic regression can include a function discovery approach for analysis and modeling of numeric multivariate data sets. Insights about data and data generating systems can be determined by discovering a variety of symbolic expressions of functions that fit a given data set.
Symbolic regression, as opposed to classical regression techniques, enables discovery of both the form of the model and its parameters. Conventional implementations of symbolic regression systems proceed by asking a user to select a set of primitive functional operators allowed to generate mathematical models to evaluate data sets and then by applying learning algorithms to deriver model structures and model parameters. However, even symbolic regression analysis can require a user to specify what expressions to look for. The need for a human user can be a severe bottleneck even in symbolic regression settings.