1. Field of the Invention
The present invention relates to a recognition apparatus using a neural network, and its learning method therefor. Unlike the prior art, the present invention will neither require the start and end edges of input data when time series data is to be processed, nor process all the possible combinations of the start and end edges. However, the present invention makes it possible to process the time series data precisely, using simplified hardware which comprises neuron elements capable of holding past history of input data.
The present invention also relates to a learning method neural network to do such a process.
2. Description of the Related Art
Several data recognition methods have practically been used particularly to learn and recognize the category of time series data. Such methods include the Dynamic Programming (DP) Method, the Hidden Markov Model (HMM) Method, and the Back Propagation Learning Rule and the Multi-Layered Perceptron (MLP) Neural Network Method. These methods are described, for example, in NAKAGAWA Seiichi, "Speech Recognition by Stochastic Model" published by the Institute of Electronics, Information and Communication Engineers and in NAKAGAWA, SHIKANO and TOHKURA, "Speech, Auditory Perception and Neural Network Model" published by Ohm Co., Ltd.
The common problem to the DP and HMM methods is that they require the start and end edges in both the teacher data and input data to be recognized. One technique of processing data apparently not depending on the start and end edges thereof is to find the start and end edges providing the best result in a trial-and-error manner. Where it is considered to detect data parts belonging to a category from input data having a length N, there are N number of possible start edges, and there also are N number of possible end edges. That is, combinations of start and end edge patterns to the order of N.sub.2 can be considered to be possible. Therefore, such a technique must recognize and process all the great number of combinations. This consumes huge processing time.
The aforementioned technique has a more essential problem due to the fact that the start and end edges of the input data are assumed than the quantitative problem of the huge number of combinations. More particularly, the start and end edges of the input data are self-evident if the input data only contains a single data belonging to a category. However, the start and end edges of the input data cannot easily and clearly be bounded if the input data includes successive data parts belonging to more than one category. Particularly, time series data such as speech data or the like does not have definite boundaries at the start and end edges, with data parts belonging to two adjacent categories being connected to each other through an overlapping transition region. Accordingly, the assumption of the start and end data edges raises a very large problem in accuracy.
On the other hand, the MLP method does not require such an assumption. Instead, the MLP method raises another problem with respect to the start and end edges of the input data in that the range of the input data must be specified. In other words, the MLP method is basically for recognizing static data. Thus, the MLP method can recognize time series data only when input data within a length of time is used while time information is equivalently processed. The length of time must be fixed due to the composition of the MLP method.
However, the length of the time series data greatly varies from one category to another, and also within the same category. For example, the average length of vowels, which are long phonemes, is ten or more times longer than that of plosives, which are short phonemes. Even in the same phonemes, the length can fluctuate over two times in actual speech. Even if the input range of data is set to be the average length, the input data of a short phoneme to be recognized will include a number of data other than the data to be recognized, and the input data of a long phoneme will include only a part of the data to be recognized. Such things cause the recognition ability to be reduced. Even though input length is appropriately set for each phoneme, it will not solve the problem since the length of each phoneme itself varies. Such problems are generally found in time series information.