1. Field of the Invention
The present invention relates to a neural network and learning method thereof, e.g., error back-propagation method and, more particularly, to a selective attention method using neural networks in which the error back-propagation method is applied to an input layer to change an input value rather than a weight values of the neural networks and use the difference between the new input value and the original one as a criterion for recognition, wherein a selective filter is added before the conventional recognition networks in order to change the input value and technologically simulate the selective attention mechanism occurring in the human brain, thus implementing a system applying the selective attention mechanism to perception of patterns such as voice or character.
2. Description of the Related Art
Generally, selective attention means concentrating an attention to a specific one of at least two simultaneous input information or signals based on the significance of the information or signal. This selective attention is a phenomenon naturally occurring in the human brain.
For example, every man can recognize a desired voice signal in the situation where many people talk at the same time, from a difference between the voice signals in the frequency or the location of the voice source. Such a selective attention is the one of the subjects of which the principle has been studied in the psychology for a long time with respect to both human and animals.
Many studies have been made on the selective attention mechanism in the field of the psychology and neuroscience and are classified into two categories: the one category is the initial selection theory that unwanted signals of several stimuli are filtered out through a selective filter prior to processing the stimuli in the brain; and the other is a theory that all signals are transferred to the processor of the brain but the brain responses more strongly to the more important signals.
These theories are still in a heated controversy and the prevailing opinion is that a combination of the two theories explains the selective attention mechanism in the human brain.
There is an attempt to technologically simulate the selective attention mechanism for more effective recognition of actual voices or characters. Although the existing studies have well simulated the selective attention mechanism in the human brain in an aspect of the neuroscience, they are meaningful only in the aspect of the biology and much hard to applying to the actual recognition. Also, there are many difficulties to implement the results of the studies used in the actual recognition in software or hardware due to extreme complexity of the structure.
The representative one of the systems developed to overcome the above-stated problems is a multi-layer perceptron system, which is widely applied to the mechanical brain called xe2x80x9cneuronxe2x80x9d or xe2x80x9cneural networkxe2x80x9d. The multi-layer perceptron system enables recognition or judgment of potential information (pattern) through iterative learning of a defined pattern.
Although the neural networks in the form of a mechanical brain according to the multi-layer perceptron system have excellent adaptability to a repeatedly acquired pattern, they are disadvantageous in that the recognition performance is abruptly deteriorated for input patterns different from the acquired pattern used during training phase.
A description will now be made in detail with reference to FIG. 1 as to the drawbacks of the prior art.
Referring to FIG. 1, a typical multi-layer perceptron is a neural network having a layer structure that includes at least one intermediate layer between input and output layers, i.e., it is a series of several single-layer perceptrons.
An input value applied to an input layer is multiplied by the weighted value of a synapse linked to the individual neurons and the resulting values are summed at the neuron of the adjacent intermediate layer. The output value of the neuron is transferred to the next intermediate layer. This process is performed in an iterative manner until the output layer. That is, an input value of the j""th neuron of the l""th intermediate layer, as denoted by ĥjl is calculated according to Equation 1.                                           h            ^                    j          1                =                                            w              j0              1                        +                                          ∑                                  k                  =                  1                                N                            ⁢                              xe2x80x83                            ⁢                                                w                  jk                  1                                ⁢                                  h                  k                                      l                    -                    1                                                                                =                                    ∑                              i                =                0                            N                        ⁢                          xe2x80x83                        ⁢                                          w                jk                1                            ⁢                              h                k                                  l                  -                  1                                                                                        [                  Equation          ⁢                      xe2x80x83                    ⁢          1                ]            
where wj0l represents the bias of ĥjl; wjkl represents the weighted value of a synapse linking the k""th neuron of the (lxe2x88x921)""th intermediate layer to the j""th neuron of the l""th intermediate layer; hklxe2x88x921 represents the output value of the k""th neuron of the (lxe2x88x921)""th intermediate layer; and variable N represents the number of the neurons of the (lxe2x88x921)""th intermediate layer.
Thus the output value from the input ĥjl of the j""th neuron in the l""th intermediate layer is defined as Equation 2.                               h          j          1                =                              f            ⁡                          (                                                h                  ^                                j                1                            )                                =                      2                          1              +                              exp                ⁡                                  (                                      -                                                                  h                        ^                                            j                      1                                                        )                                                                                        [                  Equation          ⁢                      xe2x80x83                    ⁢          2                ]            
For a correct operation of the above-structured multi-layer perceptron as a perception means, it is a requisition that the synapses linking the individual neurons have an adequate weighted value, of which the determination involves a learning process of the multi-layer perceptron and performed by the layer according to the error back-propagation algorithm.
The learning process of the multi-layer perceptron involves receiving P learning patterns at the input layer, determining a desired output value corresponding to the individual learning patterns as a target value, and calculating the weighted value of a synapse which minimizes the MSE between the actual output value and the target value of the output layer.
Accordingly, the MSE can be calculated according to Equation 3.                     E        =                                            1              2                        ⁢                                          ∑                                  P                  =                  1                                N                            ⁢                              xe2x80x83                            ⁢                                                "LeftBracketingBar"                  "RightBracketingBar"                                ⁢                                  t                  p                                                              -                                    y              p                        ⁢                                          "LeftBracketingBar"                "RightBracketingBar"                            2                                                          [                  Equation          ⁢                      xe2x80x83                    ⁢          3                ]            
where P learning patterns are xp (p=1, 2, . . . , P). yp is an output vector; and tp is a target vector.
In the error back-propagation system, the weighted value of the output layer is iteratively applied according to Equation 4 in order to minimize the MSE from the Equation 3.                               w                      ij            ⁡                          (              new              )                                1                =                              w                          ij              ⁡                              (                old                )                                      1                    +                                    ηδ              j              1                        ⁢                          h              i                              l                -                1                                                                        [                  Equation          ⁢                      xe2x80x83                    ⁢          4                ]            
where constant xcex7 represents a learning rate; and xcex4jl represents a differential value of the error for the output layer with respect to the neuron value of the individual intermediate layers. The differential value of the error for the output layer can be defined as Equation 5.                                                                         δ                i                L                            =                                                (                                                            L                      i                                        -                                          y                      i                                                        )                                ⁢                                                      f                    xe2x80x2                                    ⁡                                      (                                                                  y                        ^                                            i                                        )                                                                                                          I              ∈                                                                          output                ⁢                                  xe2x80x83                                ⁢                layer                            ;              and                                                                                                            δ                  i                  1                                =                                                      -                                                                  f                        xe2x80x2                                            ⁡                                              (                                                                              h                            ^                                                    j                          1                                                )                                                                              ⁢                                                            ∑                      i                                        ⁢                                                                  δ                        i                                                  l                          -                          1                                                                    ⁢                                              w                        ij                                                  l                          +                          1                                                                                                                                ,                                                          I              ∈                                                          intermediate              ⁢                              xe2x80x83                            ⁢                              layer                .                                                                        [                  Equation          ⁢                      xe2x80x83                    ⁢          5                ]            
In summary, the conventional error back-propagation system according to the above equations is an algorithm repeating for P learning patterns, which involves calculating the total errors of the output layer as Equation 3 from given input and target vectors through the feed-forward propagation according to Equation 1, and differentiating the errors of the output layer with respect to the neuron value of the individual intermediate layers as defined in Equation 5 to change the weighted value of the synapse and thus minimize the total errors of the output layer.
Such a multi-layer perceptron that involves simple iterative calculations to divide given input patterns into several classes is a traditional neural net popular in solving the problem relating to pattern recognition. However, the multi-layer perceptron is problematic in that perception performance may rapidly deteriorate with regard to inputs different from the already learnt patterns. Accordingly, the present invention is directed to adding a selective filter using the error back-propagation process before the conventional perception means to realize selective attention technologically and application of such ability for selective attention to the perception means, thus enhancing a perception performance in a noisy environment. Particularly, the present invention extends the back propagation of errors to the input layer, in contrast to the conventional error back-propagation system propagating the errors of the output layer only to the first intermediate to learn the weighted value of the intermediate layer.
It is, therefore, an object of the present invention to provide a selective attention method using neural networks in which a selective filter acquired through an error back propagation process is added before the conventional perception means to technologically implement a selective attention mechanism naturally occurring in the human brain and adapt the selective attention mechanism to the perception means, thus enhancing perception performance in the noisy environment.
It is another object of the present invention to provide a selective attention method using neural networks in which the error of the output layer are propagated in the reverse direction to the input layer in contrast to the conventional error back-propagation system which propagates the error of the output layer to the first intermediate layer to acquire a weighted value of the intermediate layer.
To achieve the object of the present invention, there is provided a selective attention method using a neural network, in a learning pattern of a weighted value of the neural network in a pattern recognition method using a multi-layer perceptron which is an feed-forward neural network. The selective attention method including the steps of: (1) optionally selecting a target value t=[t1, . . . , tk, . . . , tM] of an output layer with respect to a given input pattern x=[x1, . . . , xk, . . . , xN]; (2) calculating an output error E between an output value of the input pattern and the target value, according to an equation defined as E=∥txe2x88x92y∥2, wherein y represents the output value of the input pattern; and (3) learning the input pattern iteratively until the output error calculated in step (2) is less than a predetermined threshold value, thus learning the input value so as to perceive only a desired signal from the input pattern mixed with a noise.
During the steps (2) and (3), the selective attention method further includes the steps of: calculating   -            ∂      E              ∂              x        k            
for a given input pattern x; and iteratively learning the input pattern x to satisfy a relationship defined as             x              i        ⁡                  (          new          )                      =                  x                  i          ⁡                      (            old            )                              +              (                  -                                    ∂              E                                      ∂                              x                i                                                    )              ,
wherein assuming that an output value of the i""th neuron of the l""th intermediate layer of an feed-forward neural network is hil; an error value for back propagation of the neuron is             δ      i      1        =          -                        ∂          E                          ∂                      h            i            1                                ;
and a weighted value of the neuron between the i""th neuron of the (lxe2x88x921)""th intermediate layer and the j""th neuron of the l""th intermediate layer is wjil, a conventional learning method for the weighted value of the feed-forward neural network, an error back-propagation method is applied to the learning of the input pattern x to define       δ    i    0    =      -                  ∂        E                    ∂                  x          i                    
and calculate       δ    i    0    =      -                  ∑        j            ⁢                        w          ji          1                ⁢                              δ            j            1                    .                    
In another aspect of the present invention, there is provided a selective attention method using a neural network, in a learning pattern of a weighted value of the neural network in a pattern perception method using a multi-layer perceptron which is an feed-forward neural network, the selective attention method including the steps of: (1) optionally selecting a target value t=[t1, . . . , tk, . . . , tM] of an output layer with respect to a given input pattern x=[xl, . . . , xk, . . . , xN]; (2) calculating an output error E between an output value of the input pattern and the target value, according to an equation defined as E=∥txe2x88x92y∥2, wherein y represents the output value of the input pattern; and (3) calculating       δ    i    0    =      -                  ∂        E                    ∂                  x          i                    
and adapting an input value with a learning rate xcex7 to satisfy the relationship defined as xi(new)=xi(old)+xcex7xcex4iO, wherein instead of directly changing the input pattern x, a filter having an attention gain ak is added between an input terminal of the network and the input terminal of a multi-layer perceptron, wherein an output value of the filter {circumflex over (x)}=[{circumflex over (x)}1, . . . , {circumflex over (x)}k, . . . , {circumflex over (x)}N] (wherein x represents a pattern to be perceived as the input of the filter) is defined as {circumflex over (x)}k=akxc2x7xk, wherein the value {circumflex over (x)} is applied as an input of the multi-layer perceptron to substitute learning of the input pattern with a filter design through a learning of the weighted value ak of a synapse having a local link, thus calculating ai(new)=ai(old)+xcex7xcex4iOxi as in the same manner as in learning the weighted value ak of the synapse with the conventional multi-layer perceptron.
Furthermore, in perceiving patterns such as voices or characters by a selective attention method using a neural network, in a learning pattern of a weighted value of the neural network in a pattern recognition method using a multi-layer perceptron which is an feed-forward neural network, wherein the selective attention method comprises the steps of: (i) optionally selecting a target value t=[tl, . . . , tk, . . . , tM] of an output layer with respect to a given input pattern x=[xl, . . . , xk, . . . , xN]; (ii) calculating an output error E between an output value of the input pattern and the target value, according to an equation defined as E=∥txe2x88x92y∥2, wherein y represents the output value of the input pattern; and (iii) learning the input pattern iteratively until the output error calculated in step (ii) is less than a predetermined threshold value, thus learning the input value so as to perceive only a desired signal from the input pattern mixed with a noise, a method for applying the selective attention with the neural network includes the steps of: (1) selecting at least two candidate classes from the outputs of the multi-layer perceptron and applying the selective attention method to the individual candidate classes to design a selective filter; (2) defining a new criterions for perception using a distance between the output {circumflex over (x)} and the input x of the selective filter designed in step (1); and (3) applying the input pattern x to be perceived with an attention gain ak fixed at 1 to calculate the output of the feed-forward neural network, and determining a target value for every C classes having the highest output value to calculate ai(new)=ai(old)+xcex7xcex4iOxi.
Furthermore, in perceiving patterns such as voices or characters by a selective attention method using a neural network, in a learning pattern of a weighted value of the neural network in a pattern perception method using a multi-layer perceptron which is an feed-forward neural network, wherein the selective attention method comprises the steps of: (i) optionally selecting a target value t=[tl, . . . , tk, . . . , tM] of an output layer with respect to a given input pattern x=[xl, . . . , xk, . . . , xN]; (ii) calculating an output error E between an output value of the input pattern and the target value, according to an equation defined as E=∥txe2x88x92y∥2, wherein y represents the output value of the input pattern; and (iii) learning the input pattern iteratively until the output error calculated in step (ii) is less than a predetermined threshold value, thus learning the input value so as to perceive only a desired signal from the input pattern mixed with a noise, a method for applying the selective attention with the neural network includes the steps of: (1) selecting at least two candidate classes from the outputs of the multi-layer perceptron and applying the selective attention method to the individual candidate classes to design a selective filter; (2) calculating a Euclidean distance between the output {circumflex over (x)} of the filter calculated in step (ii) and the original input x according to an equation defined as Dc=∥xxe2x88x92{circumflex over (x)}c∥2, determining an error of the output layer after learning of ak in the step (iii) according to an equation defined as Ec=∥tcxe2x88x92y({circumflex over (x)}c)∥2, and defining a criterion for perception as             M      c        =                  O        c                                          (                          D              c                        )                    α                ·                              (                          E              c                        )                    β                      ,
wherein Oc represents the original output value of the corresponding output class of the multi-layer perceptron; and (3) applying the input pattern x to be perceived with an attention gain ak fixed at 1 to calculate the output of the feed-forward neural network, and determining a target value for every C classes having the highest output value to calculate ai(new)=ai(old)+xcex7xcex4iOxi.
Additional objects, advantages, and novel features of this invention will become apparent to those skilled in the art upon examination of the following examples thereof, which are not intended to be limiting.