1. Field of the Invention
This invention relates to a learning process for a neural network.
2. Background Books and Articles
The following books and articles are useful items for understanding the technical background of this invention, and each item is incorporated in its entirety by reference for its useful background information. Each item has an item identifier which is used in the discussions below.    i. B. L. Bowerman and R. T. O'Connell, Time Series Forecasting, New York: PWS, 1987.    ii. G. E. P. Box and G. M. Jenkins, Time Series Analysis, Forecasting, and Control, San Francisco, Calif.: Holden-Day, 1976.    iii. A. Cichocki and R. Umbehauen, Neural Networks for Optimization and Signal Processing, New York: Wiley, 1993.    iv. A. S. Weigend and N. A. Gershenfeld, Eds., Time Series prediction: Forecasting the Future and Understanding the Past, Reading, Mass.: Addison-Wesley, 1994.    v. A. Lepedes and R. Farber, “Nonlinear signal processing using neural network: Prediction and System Modeling,” Los Alamos Nat. Lab. Tech. Rep. LA-UR 87-2662, 1987.    vi. K. Hornik, “Approximation Capability of Multilayer Feedforward Networks,” Neural Networks, vol. 4, 1991.    vii. M. Leshno, V. Y. Lin A. Pinkus and S. Schocken. “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, pp. 861-867, 1993.    viii. S. G. Mallat, “A Theory for Multiresolution Signal Decomposition: the wavelet Representation,” IEEE Trans, Pattern Anal Machine Intell., vol. 11, pp. 674-693, July 1989.    ix. E. B. Baum and D. Haussler, “What Size Net Gives Valid Generalization,” Neural Comput, vol. 1, pp. 151-160, 1989.    x. S. German, E. Bienenstock and R. Doursat, “Neural Networks and the Bias/Variance Dilemma,” Neural Comput, vol. 4, pp. 1-58, 1992.    xi. K. J. Lang, A. H. Waibel, and G. E. Hinton, “A time-delay neural network architecture for isolated word recognition,” Neural Networks, vol. 3, pp. 23-43, 1990.    xii. Y. LeCun. “Generalization and network design strategies,” Univ. Toronto, Toronto, Ont., Canada, Tech. Rep. CRG-TR-89-4, 1989.    xiii. E. A. Wan, “Time Series Prediction by Using a Connectionist Network With Internal Delay Lines,” Time Series Prediction: Forecasting the Future and Understanding the Past. Reading, Mass.: Addison-Wesley, 1994, pp. 195-218    xiv. D. C. Plaut, S. J. Nowlan, and G. E. Hinton, “Experiments on Learning by Back Propagation,” Carnegie Mellon Univ., Pittsburgh, Pa. Tech. Rep., CMU-CS-86-126, 1986.    xv. A. Krogh and J. A. Hertz, “A Simple Weight Decay Can Improve Generalization,” Adv., Neural Inform. Process. Syst., vol. 4. pp. 950-957.    xvi. A. S. Weigend, D. E. Rumelhart, and B. A. Huberman, “Back-propagation, weight-elimination and time series prediction,” In Proc. Connenectionist Models Summer Sch., 1990, pp. 105-116.    xvii. A. S. Weigend, B. A. Huberman, and D. E. Rumelhart, “Predicting the Future: A Connectionist Approach,” Int. J. Neural Syst., vol. 1. no. 3. pp. 193-209, 1990.    xviii. M. Cottrell, B. Girard, Y. Girard, M. Mangeas, and C. Muller, “Neural Modeling for Time Series: A Statistical Stepwise Method for Weight Elimination,” IEEE Trans. Neural Networks., vol. 6. pp. 1355-1364. November 1995.    xix. R. Reed. “Pruning Algorithms—A Survey,” IEEE Trans. Neural Networks, vol. 4, pp. 740-747, 1993.    xx. M. B. Priestley, Non-Linear and Non-Stationary Time Series Analysis, New York; Academic, 1988.    xxi. Y. R. Park, T. J. Murray, and C. Chen, “Predicting Sun Spots Using a Layered perception Neural Netowrk,” IEEE Trans. Neural Networks, Vol. 7, pp. 501-505, March 1996.    xxii. W. E. Leland and D. V. Wilson. “High Time-resolution Measurement and Analysis of Ian Traffic: Implications for Ian Interconnection,” in Proc. IEEE INFOCOM, 1991, PP. 1360-1366.    xxiii. W. E. Leland, M. S. Taqqu. W. Willinger and D. V. Wilson, “On the Self-Similar Nature of Ethernet Traffic,” in Proc. ACM SIGCOMM, 1993, pp. 183-192.    xxiv. W. E. Leland, M. S. Taqqu. W. Willinger, and D. V. Wilson. “On the Self Similar Nature of Ethernet Traffic (Extended Version),” IEE/ACM Trans. Networking, Vol. 2, pp. 1-15, Feb. 1994.Related Work.
Traditional time-series forecasting techniques can be represented as autoregressive integrated moving average models (see items i and ii, above). The traditional models can provide good results when the dynamic system under investigation is linear or nearly linear. However, for cases in which the system dynamics are highly nonlinear, the performance of traditional models might be very poor (see items iii and iv, above). Neural networks have demonstrated great potential for time-series prediction. Lepedes and Farber (see item v) first proposed using multilayer feedforward neural networks for nonlinear signal prediction in 1987. Since then, research examining the approximation capabilities of multilayer feedforward neural networks (see items vi and vii) has justified their use for nonlinear time-series forecasting and has resulted in the rapid development of neural network models for signal prediction.
A major challenge in neural network learning is to ensure that trained networks possess good generation ability, i.e., they can generalize well to cases that were not included in the training set. Some research results have suggested that, in order to get good generalization, the training set should form a substantial subset of the sample space (see ix and x). However, obtaining a sufficiently large training set is often impossible in many practical real-world problems where there are only a relatively small number of samples available for training.
Recent approaches to improving generalization attempt to reduce the number of free weight parameters in the network. One approach is weight sharing as employed in certain time-delay neural networks (TDNN's) (see xi and xii) and finite impulse (FIR) networks (see xiii). However, this approach usually requires that the nature of the problem be well understood so that designers know how weights should be shared. Yet another approach is to start network training using an excessive number of weights and then remove the excess weights during training. This approach leads to a family of pruning algorithms including weight decay (see xv), weight-elimination (see xvi and xvii), and the statistical step-wise method (SSM, see xviii). For a survey of pruning techniques, see item xix. While pruning techniques might offer some benefit, this approach remains inadequate for difficult learning problems. As mentioned in item xix, for example, it is difficult to handle multi-step prediction with the statistical stepwise method.
There is therefore a need for a neural network learning process that gives a trained network possessing good generalization ability so as to provide good results even when the dynamic system under investigation is highly nonlinear.