The present invention relates to a data processing ing system in a hierarchical i.e. layered network configuration for flexibly processing data in a comprehensible and executable form, and, more specifically, to a neuro-fuzzy-integrated data processing system in a neuro-fuzzy-integrated hierarchical network configuration for establishing high speed, high precision data processing capabilities.
In a conventional serial processing computer (Neiman computer), it is very difficult to adjust data processing capabilities with the change of usage and environment. Accordingly, more flexible data processing units have been demanded and developed to process data using a parallel and distributed method in a new hierarchical network configuration specifically in the technical field of pattern recognition, applicable filter, etc. In this hierarchical network data processing unit, programs are not written explicitly, but an output signal (output pattern) provided by a network structure in response to an input signal (input pattern) presented for learning determines a weight value of an internal connection of a hierarchical network structure so that the output signal corresponds to a teaching signal (teacher pattern) according to a predetermined learning algorithm. When the weight value is determined by the above described learning process, a xe2x80x9cflexiblexe2x80x9d data processing function can be realized such that the hierarchical network structure outputs a probable output signal even though an unexpected input signal is inputted.
In the data processing unit in this hierarchical network configuration, there is an advantage in that a weight value of an internal connection can be automatically determined only if a learning signal is generated. However, there is also a disadvantage in that the content of a data process which is dependent on a weight value is not comprehensible. A successful solution would allow this data processing unit in a network structure to be put into practical use. To practically control this data processing unit in a hierarchical network configuration, an effective means must be presented for high speed, high precision data processing.
In a data processing unit comprising a hierarchical network configuration, a hierarchical network comprises one kind of node called a xe2x80x9cbasic unitxe2x80x9d and an internal connection having a weight value corresponding to an internal state value. FIG. 1 shows a basic configuration of a basic unit 1. The basic unit 1 is a multi-input/output(i/o) system. That is, it comprises a multiplier 2 for multiplying a plurality of inputs by respective weight values of an internal connection, an accumulator 3 for adding all products of the above described multiplications performed by the multiplier, and a function converter 4 for applying function conversion such as a non-linear threshold process to the resultant accumulation value so that a final output can be obtained.
Assuming that a layer h is a preprocess layer and a layer i is a post-process layer, the accumulator 3 of the i-th basic unit 1 in the layer i executes the following operation using expression (1), and the function converter 4 executes a threshold operation using expression (2).                               x          pi                =                              ∑            h                    ⁢                      xe2x80x83                    ⁢                                    y              ph                        ⁢                          W              ih                                                          (        1        )            xe2x80x83ypi=1/(1+exp(xe2x88x92xpi+xcex8i))xe2x80x83xe2x80x83(2)
where
h: a unit number in the layer h
p: a pattern number of an input signal
xcex81: a threshold of the i-th unit in the layer i
Wih: a weight value of an internal connection between the layers h and i
Yph: output from the h-th unit in the layer h in response to an input signal in the p-th pattern.
The data processing unit in a hierarchical network configuration configures its hierarchical network such that a plurality of the above described basic units 1 are hierarchically connected as shown in FIG. 2 (with input signal values distributed and outputted as an input layer 1xe2x80x2) thus performing a parallel data process by converting input signals into corresponding output signals.
The data processing unit in a hierarchical network configuration requires obtaining, by a learning process, a weight value of a hierarchical network structure which determines the data conversion. Specifically, a back propagation method attracts special attention for its practicality as an algorithm of the learning process. In the back propagation method, a learning process is performed by automatically adjusting a weight value Wih and a threshold xcex8i through feedback of an error. As indicated by expressions (1) and (2), the weight value Wih and the threshold xcex8i must be adjusted simultaneously, but the adjustment is very difficult because these values must be carefully balanced. Therefore, the threshold xcex8i is included in the weight value Wih by providing a unit for constantly outputting xe2x80x9c1xe2x80x9d in the layer h on the input side and assigning the threshold xcex8i as a weight value to the output, therefore allowing it to be processed as a part of the weight value. Thus, expressions (1) and (2) are represented as follows:                               x          pi                =                              ∑            h                    ⁢                      xe2x80x83                    ⁢                                    y              ph                        ⁢                          W              ih                                                          (        3        )            xe2x80x83ypi=1/(1+exp(xe2x88x92xpi))xe2x80x83xe2x80x83(4)
In the back propagation method, as shown in the three-layer structure comprising a layer h, a layer i and a layer j in FIG. 2, the difference (dpjxe2x88x92ypj) between an output signal ypj and a teaching signal dpj is calculated when the output signal ypj outputted from the output layer in response to an input signal presented for learning and a teaching signal dpj for matching the output signal ypj are given. Then, the following operation is performed:
xe2x80x83xcex1pj=ypj(1xe2x88x92ypj)(dpjxe2x88x92ypj)xe2x80x83xe2x80x83(5)
followed by:                               Δ          ⁢                      xe2x80x83                    ⁢                                    W              ji                        ⁡                          (              t              )                                      =                              ϵ            ⁢                                          ∑                p                            ⁢                              xe2x80x83                            ⁢                                                α                  pj                                ⁢                                  y                  pi                                                              +                      ζΔ            ⁢                          xe2x80x83                        ⁢                                          W                ji                            ⁡                              (                                  t                  -                  1                                )                                                                        (        6        )            
Thus, an updated weight value xcex94Wji(t) between the layers i and j is calculated, where t indicates the count of learnings.
Then, using the resultant xcex1pj, the following operation is performed:                               β          pi                =                                            y              pi                        ⁡                          (                              1                -                                  y                  pi                                            )                                ⁢                                    ∑              j                        ⁢                          xe2x80x83                        ⁢                                          α                pj                            ⁢                                                W                  ji                                ⁡                                  (                                      t                    -                    1                                    )                                                                                        (        7        )            
followed by:                               Δ          ⁢                      xe2x80x83                    ⁢                                    W              ih                        ⁡                          (              t              )                                      =                              ϵ            ⁢                                          ∑                p                            ⁢                              xe2x80x83                            ⁢                                                β                  pj                                ⁢                                  y                  ph                                                              +                      ζΔ            ⁢                          xe2x80x83                        ⁢                                          W                ih                            ⁡                              (                                  t                  -                  1                                )                                                                        (        8        )            
Thus, an updated weight value xcex94Wih(t) between the layers h and i is calculated.
Then, weight values are determined for the following update cycles according to the updated value calculated as described above:                                                                                                               W                    ji                                    ⁡                                      (                    t                    )                                                  =                                                                            W                      ji                                        ⁡                                          (                                              t                        -                        1                                            )                                                        +                                      Δ                    ⁢                                          xe2x80x83                                        ⁢                                                                  W                        ji                                            ⁡                                              (                        t                        )                                                                                                                                                                                                          W                    ih                                    ⁡                                      (                    t                    )                                                  =                                                                            W                      ih                                        ⁡                                          (                                              t                        -                        1                                            )                                                        +                                      Δ                    ⁢                                          xe2x80x83                                        ⁢                                                                  W                        ih                                            ⁡                                              (                        t                        )                                                                                                                                }                            (8a)            
By repeating the procedure above, learning is completed when the weight values Wji and Wih are obtained where an output signal ypj outputted from the output layer in response to an input signal presented for learning corresponds to a teaching signal dpj, a target of the output signal ypj.
When the hierarchical network has a four-layer configuration comprising layers g, h, i, and j, the following operation is performed:                               γ          ph                =                                            y              ph                        ⁡                          (                              1                -                                  y                  ph                                            )                                ⁢                                    ∑              i                        ⁢                          xe2x80x83                        ⁢                                          β                pi                            ⁢                                                W                  ih                                ⁡                                  (                                      t                    -                    1                                    )                                                                                        (        9        )            
followed by:                               Δ          ⁢                      xe2x80x83                    ⁢                                    W              hg                        ⁡                          (              t              )                                      =                              ϵ            ⁢                                          ∑                p                            ⁢                              xe2x80x83                            ⁢                                                γ                  ph                                ⁢                                  y                  pg                                                              +                      ζΔ            ⁢                          xe2x80x83                        ⁢                                          W                hg                            ⁡                              (                                  t                  -                  1                                )                                                                        (        10        )            
Thus, the updated amount xcex94Whg(t) of a weight value between the layers g and h can be calculated. That is, an updated amount xcex94W of a weight value between the preceding layers can be determined from the value obtained at the last step at the output side and the network output data.
If the function converter 4 of the basic unit 1 performs linear conversion, the expression (5) above is represented as follows:
xcex1pj=(dpjxe2x88x92ypj)xe2x80x83xe2x80x83(11)
the expression (7) above is represented as follows:                               β          pi                =                              ∑            j                    ⁢                      xe2x80x83                    ⁢                                    α              pj                        ⁢                                          W                ji                            ⁡                              (                                  t                  -                  1                                )                                                                        (        12        )            
and the expression (9) above is represented as follows:       γ    ph    =            ∑      j        ⁢          xe2x80x83        ⁢                  β        pj            ⁢                        W          ji                ⁢                  (                      t            -            1                    )                    
Thus, an expected teaching signal is outputted from the output layer in response to an input signal presented for learning in the data processing unit in a hierarchical network configuration by assigning a learned weight value to an internal connection in the hierarchical network. Therefore, a data processing function can be realized such that the hierarchical network structure outputs a probable output signal even though an unexpected input signal is inputted.
It is certain that, in the data processing unit in a hierarchical network configuration, data can be appropriately converted with a desirable input-output function, and a more precise weight value of an internal connection can be mechanically learned if an additional learning signal is provided. However, there is also a problem in that the content of data conversion executed in the hierarchical network structure is not comprehensible, and that an output signal cannot be provided in response to data other than a learning signal. Therefore, an operator feels emotionally unstable when data are controlled, even in the normal operation of the data processing unit in a hierarchical network configuration, because an abnormal condition is very hard to properly correct. Furthermore, as a learning signal is indispensable for establishing a data processing unit in a hierarchical network, a desired data processing function may not be realized when sufficient learning signals cannot be provided.
On the other hand, xe2x80x9ca fuzzy controllerxe2x80x9d has been developed and put into practical use recently for control targets which are difficult to model. A fuzzy controller controls data after calculating the extent of the controlling operation from a detected control state value by representing, in the if-then form, a control algorithm comprising ambiguity (such as determination of a human being) and executing this control algorithm based on a fuzzy presumption. A fuzzy presumption enables the establishment of an executable teacher for use with a complicated data processing function by grouping the combinations of input/output signals and connecting them ambiguously according to attribute information called xe2x80x9ca membership relation.xe2x80x9d A fuzzy teacher generated by the fuzzy presumption has an advantage in that it is comparatively comprehensible, but has a difficult problem in that a value of a membership function cannot be determined precisely, and the exact relation of the connection between membership functions cannot be determined mechanically, thus requiring enormous labor and time to put desired data processing capabilities into practical use.
The present invention has been developed in the above described background, with the objectives of realizing highly precise data processing capabilities; providing a data processing unit in a hierarchical network configuration where the executable form is very comprehensible; and providing a high speed, high precision data processing system for establishing data processing capabilities using the data processing unit in a hierarchical network configuration by flexibly combining a data processing unit in a hierarchical network configuration and a fuzzy teacher.
FIG. 3 shows a configuration of the principle of the present invention.
In FIG. 3, 10 shows a fuzzy teacher described in a fuzzy presumption form and is subjected to a complicated data process according to an antecedent membership function for representing an ambiguous linguistic expression of an input signal in numerals, a consequent membership function for representing an ambiguous linguistic expression of an output signal in numerals, and rules for developing the connection relation between these membership functions in the if-then form. The fuzzy teacher 10 has a merit in that it is generated rather easily if it is a rough teacher. However, it is very difficult to determine a precise value of a membership function or an exact rules description.
An applicable type data processing unit 11 processes data according to the hierarchical network structure with a complete connection as shown in FIG. 2. It is referred to as a xe2x80x9cpure neuroxe2x80x9d in the present invention.
The applicable type data processing unit 11 has a merit in that an internal state mechanically assigned to an internal connection in a hierarchical network structure can be learned in the above described back propagation method. However, the content of data conversion is incomprehensible in this unit.
In FIG. 3, a pre-wired-rule-part neuro 12 and a completely-connected-rule-part neuro 13 show characteristics of the present invention. FIG. 4 shows a basic configuration of a pre-wired-rule-part neuro 12 and a completely-connected-rule-part neuro 13. In FIG. 4, the pre-wired-rule-part neuro and the completely-connected-rule-part neuro, or a consequent membership function realizer/non-fuzzy part 18 (except a part of the final output side) comprise a hierarchical neural network. Each part of the network is viewed from the input side of the hierarchical neural network and is divided according to each operation of the network.
In FIG. 4, an input unit 15 receives more than one input signal indicating the control state value of data to be controlled.
A antecedent membership function realizer 16 outputs a grade value indicating the applicability of one or more antecedent membership functions in response to one or more input signals distributed by the input unit 15.
A rule part 17 often comprises a hierarchical neural network having a plurality of layers and outputs using a grade value of the antecedent membership function outputted from the antecedent membership function realizer 16, and an enlargement or reduction rate of one or more consequent membership functions corresponding to one or more output signals as a grade value of a fuzzy rule.
The consequent membership function realizer/non-fuzzy part 18 calculates a non-fuzzy process and outputs an output signal after enlarging or reducing a consequent membership function using an enlargement or reduction rate of a consequent membership function outputted by the rule part 17. The calculation of the non-fuzzy process means obtaining a center-of-gravity calculation generally performed at the final step of the fuzzy presumption.
FIG. 5 shows a configuration of a typical example of a pre-wired-rule-part neuro. FIG. 5 nearly corresponds to FIG. 4, but is different in that the consequent membership function realizer/non-fuzzy part 18 in FIG. 4 comprises a consequent membership function realizer 18a, and a center-of-gravity calculation realizer 18b. Whereas, in FIG. 5, the pre-wired-rule-part neuro comprises a hierarchical neural network except a center-of-gravity calculator 27 in the center-of-gravity calculation realizer 18b. The input units of the antecedent membership function realizer 16, rule part 17, consequent membership function realizer 18a and the center-of-gravity calculation realizer 18b are connected to respective units in the preceding layer, but they are not shown in FIG. 5.
In FIG. 5, linear function units 22a-22d in the antecedent membership function realizer 16 output a grade value of a antecedent membership function. For example, the unit 21a outputs a grade value indicating the applicability of a membership function indicating xe2x80x9cThe input x is small.xe2x80x9d
Sigmoid function units 23a-23e in the rule part 17 connect the output units 22a-22e of the antecedent membership function realizer 16 to output units 24a, 24b, and 24c in the rule part. For example, when a rule 1 of a fuzzy teacher indicates xe2x80x9cif (X is small) and (Y is small) then Z is middle,xe2x80x9d the units 22a and 22d are connected to the unit 23a, the unit 23ais connected to the unit 24b, and the connection of the units 22b, 22c, and 22e respectively to the unit 23a is not required.
The output units 24a, 24b, and 24c of the rule part 17 output an enlargement or a reduction rate of a consequent membership function. For example, the unit 24b outputs an enlargement or a reduction rate of a consequent membership function indicating xe2x80x9cZ is middlexe2x80x9d; the enlargement or reduction result of the consequent membership function is outputted by linear units 25a-25n in the consequent membership function realizer 18a using the output of the units 24a, 24b, and 24c; according to these results, two linear units 26a and 26b in a center-of-gravity determining element output unit 26 output two center-of-gravity determining elements za and zb for calculating center-of-gravity. Using the result, a center-of-gravity calculator 27 obtains the output Z of the system as a center-of-gravity value.
In FIG. 5, for example, layers are not completely connected in the antecedent membership function realizer 16, but the connection is made corresponding to a antecedent membership function. When a rule of a fuzzy teacher is obvious, the pre-wired-rule-part neuro can be applicable.
When the rule 1 indicates:
xe2x80x9cif (X is small) and (Y is small) then Z is middle,xe2x80x9d the connection of the unit 22b, 22c, and 22e respectively to the unit 23a is not required. Thus, the pre-wired-rule-part neuro is defined as a data processing system where only necessary parts are connected between the antecedent membership function realizer 26 and the rule part 17 (in the rule part 17) and between the rule part 17 and the consequent membership function realizer, according to the rules of a fuzzy teacher.
In FIG. 3, when rules as well as a membership function of the fuzzy teacher 10 are definite, a hierarchical pre-wired neural network having only necessary parts connected permits conversion of the fuzzy teacher 10 to the pre-wired-rule-part neuro 12. If rules are not definite, the fuzzy teacher 10 can be converted to the completely-connected-rule-part neuro 13 where, for example, only the antecedent and consequent membership function realizers are pre-wired.
As the present neuro is a completely-connected-rule-part neuro, the respective layers in the output units 22a-22e of the antecedent membership function realizer 16, in the units 23a-23e of the rule part, and in the output units 24a-24c of the rule part are completely connected.
In FIG. 5, the complete connection is performed between the output units 22a-22e of the antecedent membership function realizer 16 and the units 23a-23e ; of the rule part 17, and between the units 23a-23e of the rule part and the output units 24a, 24b, and 24c of the rule part. Thus, the data processing system in the hierarchical network configuration is called a completely connected neuro.
Furthermore, for example, the fuzzy teacher 10 can be converted to a pure neuro 11 by providing input/output data of the fuzzy teacher 10 for an applicable type of data processing unit; that is, a pure neuro 11 comprising a hierarchical neural network completely connected between each of adjacent layers.
Next, the applicable type of data processing unit, that is, the pure neuro 11, learns input/output data to be controlled. After the learning, a comparatively less significant connection in the pure neuro 11 (a connection having a smaller weight value) is disconnected, or the network structure is modified to convert the pure neuro 11 to the pre-wired-rule-part neuro 12 or the completely-connected-rule-part neuro 13. Then, the fuzzy teacher 10 of the antecedent membership function, consequent membership function, fuzzy rule part, etc. can be extracted by checking the structure of the pre-wired-rule-part neuro 12. By checking the structure of the completely-connected-rule-part neuro 13, the fuzzy teacher 10 of the antecedent membership function and the consequent membership function can be extracted.