Recently, research into an application of an effective pattern recognition method of human beings to an actual computer has been actively conducted as a solution to classification input patterns frequently encountered in an engineering field into particular groups.
Among various computer-application studies is a study of an artificial neural network ergonomically modeling cell structures of human beings in which an effective pattern recognition takes place. In order to solve the problem of classification of input patterns into particular groups, an artificial neural network uses an algorithm imitating human beings' capability of learning. Through this algorithm, an artificial neural network may generate mapping between an input pattern and an output pattern, which is expressed that the artificial neural network has learning ability. Also, the artificial neural network has generalization capability to generate a relatively proper output with respect to an input pattern which has not been used for learning, on the basis of learning results. Due to the two typical performance of learning and generalization, the artificial neural network is applied to a problem which is hardly solved by an existing sequential programming method. The artificial neural network has a wide usage range and is actively applied to pattern classification, continuous mapping, non-linear system identification, non-linear control, robot control field, and the like.
The artificial neural network represents an operation model implemented with software or hardware imitating computation capability of a biological system using a large number of artificial neurons connected by connection lines. In the artificial neural network, artificial neurons formed by simplifying functions of biological neurons are used. The artificial neurons are connected through connection lines having connection strength to perform human beings' cognitive working or a learning process. The connection strength is a particular value of the connection lines, which is also called a connection weighted value. Learning of the artificial neural network may be divided into supervised learning and unsupervised learning. Supervised learning refers to a method of putting input data and corresponding output data together in a neural network and updating connection strength of connection lines such that output data corresponding to the input data is output. Typical learning algorithms include a delta rule and back propagation learning.
Unsupervised learning refers to a method of learning connection strength by an artificial neural network itself using only input data without a target value. Unsupervised learning is a method of updating connection weighted values based on correlation between input patterns.
As illustrated in FIG. 1, a speech signal processing procedure in a general speech recognition system includes a step of extracting a feature parameter (O) through a process of canceling noise from an input speech signal (X) in a time domain, extracting features, and normalization, and a step of obtaining a word (W) outputting a maximum likelihood with respect to the feature parameter (O). This may be expressed as Equation (1) below.W*=argmaxwP(w/O)  (1)
Namely, the current speech recognition ends up with searching for a word (W) having a maximum likelihood with respect to a converted feature parameter (O), rather than obtaining a word having a maximum likelihood with respect to the speech input signal (X) in the time domain. The array of signal processing procedure for extracting features appropriate for recognition of the voice input signal (X) in the time domain may be expressed as Equation (2) below.O=f(X;θp)  (2)where O denotes the feature parameter, X denotes the speech input signal, and θp denotes a model parameter in a signal processing algorithm.
However, in the general speech recognition system described above, the model parameter (θp) adjusting a signal processing procedure for obtaining the feature parameter (O) from the speech input signal (X) in the time domain is not explicitly set to a value for maximizing speech recognition performance. In general, a signal processing function is extracted by modeling a speech generation or a recognition process, and it may be implicitly set toward improvement of speech recognition performance but not explicitly set to satisfy a reference for maximizing speech recognition performance.