The problem of extracting nonlinear relationships in large high-dimensional scattered data sets is of central importance across fields of science, engineering and mathematics. In particular, diverse areas such as machine learning, optimal control, mathematical modeling of physical systems often rely significantly on the ability to construct relationships from data. Subsequently there have been a multitude of applications including financial time-series analysis, voice recognition, failure prediction and artificial intelligence all of which provide evidence of the importance for nonlinear function approximation algorithms.
The beginnings empirical data fitting may be traced to Gauss's work on using least squares to construct linear models. Over the last two decades we have seen a tremendous growth in this area motivated by new ideas for computing nonlinear models. Numerous references of prior art articles are cited herein. In most cases, when a reference is referred to herein, it is cited by its number (in square brackets) from the References section hereinbelow. Thus, e.g., for computing nonlinear models, the following references [11, 45, 40, 37, 38] in the References Section disclose exemplary prior art techniques.
Diverse areas such as machine learning, optimal control, mathematical modeling of physical systems often rely significantly on the ability to construct relationships from data such as provided by constructing robust approximation models. Moreover, there have been a multitude of applications including financial time-series analysis, voice recognition, failure prediction and artificial intelligence all of which provide evidence of the importance for nonlinear function approximation algorithms. Our interest in this problem relates to representing data on a manifold as the graph of a function [8, 9] and the reduction of dynamical systems (see, e.g. [10]).
A common element in empirical data fitting applications is that the complexity of the required model including the number and scale of representation functions is not known a priori and must be determined as efficiently as possible. A variety of approaches have been proposed to determine the number of model functions, i.e., the model order problem. A generally accepted measure of quality of such data fitting algorithms is that the resulting models generalize well to testing data, i.e., data associated with the same process but that was not used to construct the model. This requirement is essentially that the data not be overfit by a model with too many parameters or underfit by a model with too few parameters.
One general approach to this problem is known as regularization, i.e., fitting a smooth function through the data set using a modified optimization problem that penalizes variation. A standard technique for enforcing regularization constraints is via cross-validation [18, 44]. Such methods involve partitioning the data into subsets of training, validation and testing data; for details see, e.g., [19].
Additionally, a variety of model growing and pruning algorithms have been suggested, e.g.,                (1) the upstart algorithm by M. Frean, “A Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks,” Neural Computation, 2(2):198-209, 1990;        (2) the cascade correlation algorithm by S. E. Fahlman and C. Lebiere, “The cascade-correlation learning architecture,” In D. S. Touretzky, editor, Proceedings of the Connectionist Models Summer School, volume 2, pages 524-432, San Mateo, Calif., 1988;        (3) the optimal brain damage algorithm by Y. Le Cun, J. S. Denker, and S. A. Solla, “Optimal brain damage”, In D. S. Touretzky, editor, Advances in Neural Information Processing Systems, volume 2, pages 598-605. Morgan Kaufmann, San Mateo, Calif., 1990; and        (4) the resource allocating network (RAN) proposed by Platt in “A Resource Allocating Network for Function Interpolation,” Neural Computation, 3:213-225, 1991.        
Statistical methods have also been proposed that include, e.g., Akaike information criteria (AIC), Bayesian information criteria (BIC) and minimum description length (MDL) [42], [2] and Bayesian model comparison [29]. In [39], [31] and [20] the issue of selecting the number of basis functions with growing and pruning algorithms from a Bayesian prospective have been studied. In [5], a hierarchical full Bayesian model for RBFs is proposed. The maximum marginal likelihood of the data has also been used to determine RBF parameters [34]. For a more complete list of references the reader is referred to [21].
In general, model order determination via both regularization and growing and pruning algorithms can be computationally intensive and data hungry. More importantly, however, is that these algorithms do not explicitly exploit the geometric and statistical structure of the residuals (see Terms and Descriptions section hereinbelow) during the training procedure. In addition, many algorithms in the literature require that anywhere from a few to a dozen ad hoc parameters be tuned for each data set under consideration.
Accordingly, it is desirable to alleviate the modeling difficulties in the prior art, and in particular, at least provide model generating methods and systems that are computationally less intensive, and that reduce (preferably to zero) the number of model generation parameter required for generating a model of appropriate accuracy.