The present invention relates generally to function approximation or regression analysis, the process of deriving from empirical data a function, or a series of summed functions, which reasonably describes the real process creating the empirical data, and more specifically to an improved orthogonal least squares (OLS) method for mapping or relating variables that can distinguish signal from noise.
Controlling physical processes frequently requires a method for experimentally determining, or approximating, functions or equations which describe the physical process, particularly for complex processes where theoretical descriptions of the process are difficult to derive.
To learn a function from empirical data, a method of mapping or relating variables which is capable of distinguishing signal from noise is necessary. A number of statistical and neural network methods are used to regress empirical data. Unfortunately, all such methods are not able to successfully distinguish signal from noise.
Traditional methods for function approximation or regression involve a linear combination of the product of single variable or fixed basis functions (e.g., polynomial, spline, and/or trigonometric expansions). As described in Barron, A. R., and Xiao, X., 1991, Discussion of xe2x80x9cMultivariable adaptive regression splinesxe2x80x9d by J. H. Friedman. Ann. Stat. 19, pp. 67-82, the problem with traditional methods is that there are exponentially many orthonormal functions but unless all of these orthonormal functions are used in the fixed basis, there will remain functions that are not well approximated, i.e., the order of the squared approximation error is 1/n(2/d), where n is the number of basis functions and d is the number of input variables. This problem is avoided by tuning or adapting the parameters of multi-variable basis functions to fit the target function as in the case of neural networks, wherein the order of the squared approximation error is 1/n.
The biological origins of neural networks, as chronicled in Pao, Y. H., 1996, xe2x80x9cMemory based computational intelligence for materials processing and design,xe2x80x9d Wright Laboratory Technical Report WL-TR-96-4062, Wright-Patterson AFB, OH, pp. 1-14, established the multi-variable sigmoid as xe2x80x9cthexe2x80x9d basis function for neural networks. Today the suite of multi-variable basis functions employed in neural networks is without bound, but the most commonly used are the sigmoid and radial basis functions. Radial basis function neural networks typically employ subset selection to identify a set of Gaussian basis functions. Broomhead, D. S., and Lowe, D., 1988, Multivariable functional interpolation and adaptive methods. Complex Syst. 2, pp. 321-355, have tried to choose such a subset randomly from the entire given set. In lieu of random selection, Rawlings, J. O., 1988, Applied Regression Analysis, Wadsworth and Brooks/Cole, Pacific Grove, Calif., has proposed a systematic approach that employs forward selection to choose the subset that best explains the variation in the dependent variable incrementally. Based on this concept, Chen, S., Cowan, C. F. N. and Grant, P. M., 1991, xe2x80x9cOrthogonal least squares learning algorithm for radial function networks,xe2x80x9d IEEE Trans. on Neural Networks, Vol. 2, No.2, pp. 302-309, presented an efficient implementation of forward selection using the orthogonal least square method (OLS). Subset selection can also be used to avoid overfitting by limiting the complexity of the network. From the literature, overfitting may be avoided when combining subset selection with other methods such as regularization in Barron, A. R., and Xiao, X., 1991, Discussion of xe2x80x9cMultivariable adaptive regression splinesxe2x80x9d by J. H. Friedman. Ann. Stat. 19, pp. 67-82; in Breiman, L., 1992, Stacked Regression, Tech. Rep. TR-367, Department of Statistics, University of California, Berkeley; and, as contributed by Mark Orr, combining OLS and regularization, in Orr, M. J. L., 1995, xe2x80x9cRegularization in the Selection of Radial Basis Function Centers,xe2x80x9d Neural Computation, 7, pp. 606-623.
Unfortunately, as described in the Detailed Description, the traditional approach of subset selection is insufficient, requiring computational complexity and slower than desired convergence.
Thus it is seen that there is a need for improved methods of subset selection as part of the orthogonal least squares (OLS) method for function approximation.
It is, therefore, a principal object of the present invention to provide an improved OLS method for training neural networks.
It is a feature of the present invention that it provides increased computational tractiveness over prior art methods.
It is another feature of the present invention that it provides faster convergence than prior art methods.
It is an advantage of the present invention that it provides more accurate function approximations than prior art methods.
These and other objects, features and advantages of the present invention will become apparent as the description of certain representative embodiments proceeds.
In accordance with the foregoing principles and objects of the present invention, a new method for using the Orthogonal Least Squares method for training neural networks is described. Instead of selecting a subset of orthogonal basis from a selected subset of given regressors, the method of the present invention finds the subset of orthogonal basis from an orthogonal combination of the given regressor set. The benefit of this approach is that it avoids discarding useful information and excessive weight enlargement in linear links of a neural network.
With a unique transformation of the basis functions used to affect a mapping of variables, a functional mapping is now achievable. A functional mapping means that the dependent variable(s) can be explained in terms of the independent variables only. Any additional variables and/or environmental noise which contribute to the dependent variable values are not explained by this mapping because of the unique transformation of the basis functions.
Accordingly, the present invention is directed to a method for choosing a set of orthogonal basis functions for a function approximation from empirical data described as             {                        x          t                ,                  y          t                    }              t      =      1        P    ,
comprising the steps of constructing a heterogeneous regressor set   F  =            {              f        i            }              i      =      1        N  
from a set of randomly selected basis functions, defining xcexa8 as xcexa8xe2x89xa1[xcfx861, xcfx862, . . . , xcfx86N]=rearrangement (F) by at a first step k=1, denoting a first column of the xcexa8 matrix, xcfx861xe2x89xa1ft(1), selected from fi(1) where                                           "LeftDoubleBracketingBar"                          f              t                              (                1                )                                      "RightDoubleBracketingBar"                    2                =                  max          ⁢                      {                          "AutoLeftMatch"                                                "LeftDoubleBracketingBar"                                      f                    i                                          (                      1                      )                                                        "RightDoubleBracketingBar"                                2                                                        "RightBracketingBar"              i      =      N        N    }
and the first orthogonal basis is             h      1        =                  ∑                  i          =          1                N            ⁢                                    ⟨                                          f                t                                  (                  1                  )                                            ,                              f                i                                  (                  1                  )                                                      ⟩                                              "LeftDoubleBracketingBar"                              f                t                                  (                  1                  )                                            "RightDoubleBracketingBar"                        2                          ⁢                  f          t                      (            1            )                                ,
building an orthogonal basis matrix H by at a kth step, where kxe2x89xa72, calculate fi(k) and hk as             f      i              (        k        )              =                  f        i                  (                      k            -            1                    )                    ⁢                        ⟨                                    f              i                              (                                  k                  -                  1                                )                                      ,                          f              t                              (                                  k                  -                  1                                )                                              ⟩                                      "LeftDoubleBracketingBar"                          f              t                              (                                  k                  -                  1                                )                                      "RightDoubleBracketingBar"                    2                    ⁢              f        t                  (                      k            -            1                    )                      ,      
    ⁢            h      k        =                  ∑                  i          =          1                N            ⁢                                    ⟨                                          f                t                                  (                  k                  )                                            ,                              f                i                                  (                  k                  )                                                      ⟩                                              "LeftDoubleBracketingBar"                              f                t                                  (                  k                  )                                            "RightDoubleBracketingBar"                        2                          ⁢                  f          t                      (            k            )                                ,
such that hk can be simplified as             h      k        =                  ∑                  m          =          1                N            ⁢                                    ⟨                                          (                                                      ϕ                    k                                    -                                                            ∑                                              i                        =                        1                                                                    k                        -                        1                                                              ⁢                                                                                            ⟨                                                                                    ϕ                              k                                                        ,                                                          ϕ                              i                                                                                ⟩                                                                                                      "LeftDoubleBracketingBar"                                                          ϕ                              i                                                        "RightDoubleBracketingBar"                                                    2                                                                    ⁢                                              ϕ                        i                                                                                            )                            ,                              ϕ                m                                      ⟩                                              "LeftDoubleBracketingBar"                                                ϕ                  k                                -                                                      ∑                                          i                      =                      1                                                              k                      -                      1                                                        ⁢                                                                                    ⟨                                                                              ϕ                            k                                                    ,                                                      ϕ                            i                                                                          ⟩                                                                                              "LeftDoubleBracketingBar"                                                      ϕ                            i                                                    "RightDoubleBracketingBar"                                                2                                                              ⁢                                          ϕ                      i                                                                                  "RightDoubleBracketingBar"                        2                          ⁢                  (                                    ϕ              k                        -                                          ∑                                  i                  =                  1                                                  k                  -                  1                                            ⁢                                                                    ⟨                                                                  ϕ                        k                                            ,                                              ϕ                        i                                                              ⟩                                                                              "LeftDoubleBracketingBar"                                              ϕ                        i                                            "RightDoubleBracketingBar"                                        2                                                  ⁢                                  ϕ                  i                                                              )                      ,  wherexe2x80x83xcfx86k=ft(k),
initializing by letting Hsubset=xcfx86, where xcfx86 is an empty set and let k=1,
finding hi such that             max      i        ⁢          {                                    (                                          y                T                            ⁢                              h                i                                      )                    2                          λ          +                                                    (                                  h                  i                                )                            T                        ⁢                          h              i                                          }        ,      xe2x80x83    ⁢      and  
including hi as an element of the Hsubset such that
Hsubset=Hsubset∪hi,
regularizing by modifying the generalized cross validation variable xcex by letting the index of selected ft(k) in the original F matrix be j, where                     (                                                            "LeftDoubleBracketingBar"                                  f                  t                                      (                    k                    )                                                  "RightDoubleBracketingBar"                            2                        =                          max              ⁢                              {                                  "AutoLeftMatch"                                                            "LeftDoubleBracketingBar"                                              f                        i                                                  (                          k                          )                                                                    "RightDoubleBracketingBar"                                        2                                                                                "RightBracketingBar"                )                    i        =        1            N        }    ,            such      ⁢              xe2x80x83            ⁢      that      ⁢              xe2x80x83            ⁢              ϕ        k              =          f      j              (        1        )              ,  and
stopping if ∥ft(k)∥2xe2x89xa6xcex5, where xcex5 is a preselected minimum value, otherwise letting k=k+1 and repeating beginning at step (e).
The present invention is also directed to a method for controlling a physical process, comprising the steps of obtaining a set of empirical data from the physical process, determining a function approximation of the physical process from the empirical data by the just described method, and using the determined function approximation to choose process parameters for obtaining preselected physical results from the physical process.