1. Field of the Invention
This invention relates to a learning process for a neural network.
2. Background Books and Articles
The following books and articles are useful items for understanding the technical background of this invention, and each item is incorporated in its entirety by reference for its useful background information. Each item has an item identifier which is used in the discussions below.
i. B. L. Bowerman and R. T. O""Connell, Time Series Forecasting, New York: PWS, 1987.
i. G. E. P. Box and G. M. Jenkins, Time Series Analysis, Forecasting, and Control, San Francisco, Calif.: Holden-Day, 1976.
iii. A. Cichocki and R. Umbehauen, Neural Networks for Optimization and Signal Processing, New York: Wiley, 1993.
iv. A. S. Weigend and N. A. Gershenfeld, Eds., Time Series prediction: Forecasting the Future and Understanding the Past, Reading, Mass.: Addison-Wesley, 1994.
v. A. Lepedes and R. Farber, xe2x80x9cNonlinear signal processing using neural network: Prediction and System Modeling,xe2x80x9d Los Alamos Nat. Lab. Tech. Rep. LA-UR 87-2662, 1987.
vi. K. Hornik, xe2x80x9cApproximation Capability of Multilayer Feedforward Networks,xe2x80x9d Neural Networks, vol. 4, 1991.
vii. M. Leshno, V. Y. Lin A. Pinkus and S. Schocken. xe2x80x9cMultilayer feedforward networks with a nonpolynomial activation function can approximate any function,xe2x80x9d Neural Networks, vol. 6, pp. 861-867, 1993.
viii. S. G. Mallat, xe2x80x9cA Theory for Multiresolution Signal Decomposition: the wavelet Representation,xe2x80x9d IEEE Trans, Pattern Anal. Machine Intell., vol. 11, pp. 674-693, July 1989.
ix. E. B. Baum and D. Haussler, xe2x80x9cWhat Size Net Gives Valid Generalization,xe2x80x9d Neural Comput., vol. 1, pp. 151-160, 1989.
x. S. German, E. Bienenstock and R. Doursat, xe2x80x9cNeural Networks and the Bias/Variance Dilemma,xe2x80x9d Neural Comput., vol. 4, pp. 1-58, 1992.
xi. K. J. Lang, A. H. Waibel, and G. E. Hinton, xe2x80x9cA timexe2x80x94delay neural network architecture for isolated word recognition,xe2x80x9d Neural Networks,, vol. 3, pp. 23-43, 1990.
xii. Y. LeCun. xe2x80x9cGeneralization and network design strategies,xe2x80x9d Univ. Toronto, Toronto, Ont., Canada, Tech. Rep. CRG-TR-89-4, 1989.
xiii. E. A. Wan, xe2x80x9cTime Series Prediction by Using a Connectionist Network With Internal Delay Lines,xe2x80x9d Time Series Prediction: Forecasting the Future and Understanding the Past. Reading, Mass.: Addison-Wesley, 1994, pp. 195-218
xiv. D. C. Plaut, S. J. Nowlan, and G. E. Hinton, xe2x80x9cExperiments on Learning by BackPropagation,xe2x80x9d Carnegie Mellon Univ., Pittsburgh, Pa. Tech. Rep., CMU-CS-86-126, 1986.
xv. A. Krogh and J. A. Hertz, xe2x80x9cA Simple Weight Decay Can Improve Generalization,xe2x80x9d Adv., Neural Inform. Process. Syst., vol. 4. pp. 950-957.
xvi. A. S. Weigend, D. E. Rumelhart, and B. A. Huberman, xe2x80x9cBack-propagation, weight-elimination and time series prediction,xe2x80x9d In Proc. Connenectionist Models Summer Sch., 1990, pp. 105-116.
xvii. A. S. Weigend, B. A. Huberman, and D. E. Rumelhart, xe2x80x9cPredicting the Future: A Connectionist Approach,xe2x80x9d Int. J. Neural Syst., vol. 1. no. 3. pp. 193-209, 1990.
xviii. M. Cottrell, B. Girard, Y. Girard, M. Mangeas, and C. Muller, xe2x80x9cNeural Modeling for Time Series: A Statistical Stepwise Method for Weight Elimination,xe2x80x9d IEEE Trans. Neural Networks., vol. 6. pp. 1355-1364. November 1995.
xix. R. Reed. xe2x80x9cPruning Algorithmsxe2x80x94A Survey,xe2x80x9d IEEE Trans. Neural Networks, vol. 4, pp. 740-747, 1993.
xx. M. B. Priestley, Non-Linear and Non-Stationary Time Series Analysis, New York; Academic, 1988.
xxi. Y. R. Park, T. J. Murray, and C. Chen, xe2x80x9cPredicting Sun Spots Using a Layered perception Neural Netowrk,xe2x80x9d IEEE Trans. Neural Networks, Vol. 7, pp. 501-505, March 1996.
xxii. W. E. Leland and D. V. Wilson. xe2x80x9cHigh Time-resolution Measurement and Analysis of Ian Traffic: Implications for Ian Interconnection,xe2x80x9d in Proc. IEEE INFOCOM, 1991, PP. 1360-1366.
xxiii. W. E. Leland, M. S. Taqqu. W. Willinger and D. V. Wilson, xe2x80x9cOn the Self-Similar Nature of Ethernet Traffic,xe2x80x9d in Proc. ACM SIGCOMM, 1993, pp. 183-192.
xxiv. W. E. Leland, M. S. Taqqu, W. Willinger and D. V. Wilson. xe2x80x9cOn the Self Similar Nature of Ethernet Traffic (Extended Version),xe2x80x9d IEE/ACM Trans. Networking, Vol. 2, pp. 1-15, Febuary 1994.
Related Work
Traditional time-series forecasting techniques can be represented as autoregressive integrated moving average models (see items i and ii, above). The traditional models can provide good results when the dynamic system under investigation is linear or nearly linear. However, for cases in which the system dynamics are highly nonlinear, the performance of traditional models might be very poor (see items iii and iv, above). Neural networks have demonstrated great potential for time-series prediction. Lepedes and Farber (see item v) first proposed using multilayer feedforward neural networks for nonlinear signal prediction in 1987. Since then, research examining the approximation capabilities of multilayer feedforward neural networks (see items vi and vii) has justified their use for nonlinear time-series forecasting and has resulted in the rapid development of neural network models for signal prediction.
A major challenge in neural network learning is to ensure that trained networks possess good generation ability, i.e., they can generalize well to cases that were not included in the training set. Some research results have suggested that, in order to get good generalization, the training set should form a substantial subset of the sample space (see ix and x). However, obtaining a sufficiently large training set is often impossible in many practical real-world problems where there are only a relatively small number of samples available for training.
Recent approaches to improving generalization attempt to reduce the number of free weight parameters in the network. One approach is weight sharing as employed in certain time-delay neural networks (TDNN""s) (see xi and xii) and finite impulse (FIR) networks (see xiii). However, this approach usually requires that the nature of the problem be well understood so that designers know how weights should be shared. Yet another approach is to start network training using an excessive number of weights and then remove the excess weights during training. This approach leads to a family of pruning algorithms including weight decay (see xv), weight-elimination (see xvi and xvii), and the statistical step-wise method (SSM, see xviii). For a survey of pruning techniques, see item xix. While pruning techniques might offer some benefit, this approach remains inadequate for difficult learning problems. As mentioned in item xix, for example, it is difficult to handle multi-step prediction with the statistical stepwise method.
There is therefore a need for a neural network learning process that gives a trained network possessing good generalization ability so as to provide good results even when the dynamic system under investigation is highly nonlinear.
It is the object of this invention to provide a neural network learning process that provides a trained network that has good generalization ability for even highly nonlinear dynamic systems. In one embodiment, the objective is realized in a method of predicting a value for a series of values. According to this method, several approximations of a signal are obtained, each at a different respective resolution, using the wavelet transformation. Then, a neural network is trained using, successively, the approximations in order beginning with the lowest resolution approximation and continuing up through the higher resolution approximations. The trained neural network is used to predict values, and has good generalization even for highly nonlinear dynamic systems. In a preferred embodiment of the invention, the trained neural network is used in predicting network traffic patterns.