1. Field of the Invention
The present invention relates to a time series signal analyzer comprising a new neural network suitable for pattern recognition in voice recognition applications and other time series signal recognition applications.
2. Prior Art
FIG. 2 is an example of the hierarchical neural network used in the present invention. The arrows between neuron units indicate the direction of signal flow. A specific weighting coefficient is applied to the output from the unit from which each arrow starts, and the weighted output is thus input to the destination neuron to which the arrow points. Each row of neurons is called a layer, and there are thus three layers shown in this sample neural network. The layer of neuron units directly connected to the inputs to the neural network is called the "input layer," the layer from which the output of the neural network is output is the "output layer," and all other layers are called "hidden layers" or "intermediate layers."
In this example the first layer is the input layer, the second layer is the intermediate layer, and the third layer is the output layer. The relationship between the sum of inputs to and the output from a neuron unit is typically defined by assigning a so-called Sigmoid function. FIG. 3 is an example of this Sigmoid function. In addition, a linear function is normally assigned to the input layer rather than a Sigmoid function to simply throughput the input to the input layer.
By adjusting the weighting coefficients of the neural network, the desired output can be obtained as the output from the output layer for a given pattern input to the input layer. To illustrate, think of how the letters "A," "B," and "C" are distinguished. In its simplest form, a neural network works by overlaying a grid of, for example, 32.times.32 squares to each character to be recognized where each square of the grid corresponds to one neuron unit (in this case there are 32.times.32=1024 input units). Each square containing part of a line in the character outputs a 1 and each square not containing a line component outputs a 0 to the corresponding unit of the neural network input layer. The output layer consists of three units such that when the letter "A" is input to the input layer the first output layer unit outputs "1" and the other units output "0"; similarly, when a "B" is input, the second output layer unit outputs "1" and the other units "0," and when "C" is input, the third output layer unit outputs "1" and the other units "0." The neural network is then trained by inputting many different samples to adjust the weighting coefficients until these results are obtained. Once the neural network is trained, an unknown input, such as "X," can be recognized as either "A," "B," or "C" based on which output unit has the highest value. In a hierarchical neural network, there are ways of guessing the weighting coefficient from plural training patterns, and a high real recognition rate can be obtained in conventional character recognition applications.
Problem to be Solved
This recognition process has been shown to be effective with input patterns of a fixed size (1024 bits in this example), but other problems are faced with patterns having a variable pattern size, e.g., voice patterns, which have a variable time base. The feature quantities of a voice are often expressed as a so-called feature vector series converted to approximately 10-20 parameter sets every 10 msec. Thus, if the voice is converted to a 10-dimension feature vector every 10 msec, the period required to express a phoneme /b/ may be 20 or possibly 30 frames. Thus, even if each of the parameters defining the feature vector corresponds to one input unit of the neural network, the total number of inputs required for pattern recognition is variable and may be (20.times.10=)200 to (30.times.10=)300 input units. In addition, it is extremely difficult to apply a conventional neural network to voice recognition because voice compression and expansion to the time axis is non-linear.