The present invention relates generally to data processing systems and methods. More specifically, it relates to an artificial neural network-generated fuzzy expert system from which an accurate, compact, interpretable, and meaningful set of rules may be extracted.
There are many approaches to data processing for developing rule sets for pattern recognition from provided data. Typical approaches utilize artificial neural networks (ANNs) or decision tree methods such as C5. The basic structure of an ANN comprises many layers of processing elements, which are referred to as neurons. The neurons in these many layers are interconnected by links that are assigned weight values during training. The weighted values are then interpreted to form rules to approximate the data. Data processing approaches such as the aforementioned find many uses in pattern recognition operations such as automotive occupant sensing and recognition, facial pattern recognition, gesture recognition, and object recognition, among others.
In some applications, such as automotive occupant sensing and recognition in particular, efficient operation is a key factor in success. In order to be practical, these methods must satisfy four critical constraints. The first constraint is that the methods must be extremely accuarate so that they can correctly handle the large number of finely differentiated possible geometric configurations for a vehicle occupant. The second constraint is that the method must have a fast response time. This is required to provide sufficient time for deployment of mechanical hardware, such as airbag systems, during collisions/accidents. The third constraint is that the method allow for the rationale for its actions under various situations to be understood and interpreted by humans. Human understanding and interpretation of occupant sensing and recognition methods is very important for product development, support, and analysis purposes. The last constraint is that the method must be inexpensive to implement in hardware. This is necessary to allow feasible implementation in an automobile and to provide an economic competitive advantage in the marketplace.
ANNs and C5 decision tree networks have previously been applied to pattern recognition operations. With regard to ANNs, the main disadvantage is the inability to explain learned knowledge from ANNs in a manner that can be easily understood by humans. As stated before, the ability to generate explainable rules is important for product development, support, and analysis purposes. The C5 decision tree network satisfies the aforementioned constraints to a degree. However, it is still desirable to provide a greater degree of accuracy and a more compact rule set.
ANNs, while capable of providing compact, highly accurate rule sets, have been criticized as being xe2x80x9cblack boxesxe2x80x9d because their behavior has historically been unexplainable. In the article entitled xe2x80x9cAre Artificial Neural Networks Black Boxes?xe2x80x9d, IEEE Transactions on Neural Networks, Vol. 8, No. 5, September 1997, incorporated herein by reference, Benitez, Castro, and Requena attempted to solve this problem by developing a new fuzzy-logic operator termed the interactive-or, or I-OR, operator. The interactive-or operator may be used to derive fuzzy rules from a fully trained ANN. While the method developed by Benitez et al. is able to extract fuzzy rules, the rules are not easily interpretable by humans because there is no assurance that the values of the input features, as reflected in the antecedents, of each fuzzy rule will fall within the allowable range of each input feature. In fact, although a particular antecedent may be unimportant to a particular rule, in many cases, all of the antecedents may exceed the range used to train the neural network. Finally, the output values, or consequents, are expressed as numeric values, further reducing the interpretability of the extracted rules.
A simplified example of a three-layered ANN, comprising an input layer 100, a hidden layer 102, and an output layer 104 is shown in FIG. 1. As shown, the input layer 100 includes two input nodes, X1 106 and X2 108, which provide data to the network. The hidden layer 102 includes two hidden layer nodes, H1 110 and H2 112. The hidden layer nodes H1 110 and H2 112 each correspond to a unique fuzzy rule where, in the general case, the total number of hidden layer nodes Hj corresponds to the total number of rules j in the system. As shown in the diagram, the hidden layer nodes, H1 110 and H2 112 also provide the output variables Y1 and Y2 for the generation of the rule base. In the example of FIG. 1, therefore, there are two rules in the rule base because there are two hidden layer nodes, H1 110 and H2 112. Specifically, are as many rules j as there are nodes in the hidden layer 102. According to the work of Benitez et al., the rules j for the hidden layer nodes, H1 110 and H2 112 may be formulated as:
Rule 1: IF {X1 is A1} I-OR {X2 is B1} THEN {Y1 is C1}
and
xe2x80x83Rule 2: IF {X1 is A2} I-OR {X2 is B2} THEN {Y2 is C2},
where {Ai, Bi, Ci} represent the fuzzy sets that describe the input variables {X1, X2} and the output variables { Y1, Y2} for each rule. The terms in brackets { } to the left of THEN correspond to the xe2x80x9cantecedentsxe2x80x9d for each rule. The terms in brackets { } to the right of THEN correspond to the xe2x80x9cconsequentsxe2x80x9d for each rule. In general, there are as many antecedents as inputs Xi in the input layer 100 with ixe2x88x921 interactive or terms between them. Thus, given two inputs X1 and X2, two antecedents would be combined into rules as shown above for the two input case with one I-OR term between them.
The rules above appear similar to rules found in traditional fuzzy logic systems, except for the presence of the I-OR terms. Two important features in the rule formulation above add clarity to the similarity between traditional fuzzy logic systems and the I-OR function. The first feature relates to the explainability of the rules. In a traditional fuzzy logic system, fuzzy sets are expressed in terms of linguistic labels such as SMALL, TALL, etc., and not with numeric values. Thus, they are more readily understandable and interpretable by humans. The analogous interpretation for the fuzzy set of each antecedent (e.g. (A1 and B1) in the Rule 1, above) for a given rule j was derived from the neural network described in the work of Benitez et al. to be of the form xe2x80x9cXi is greater/lower than approximately (2.2xe2x88x92Tj/2)/Wijxe2x80x9d. The value of 2.2 was obtained by inverting the unipolar sigmoidal activation function,       f    ⁢          (      x      )        =      1          1      +              xe2x80x83            ⁢              ⅇ                                            -                              W                ij                                      ⁢                          X              i                                +                      T            j                              
at an activation value (chosen at 0.9). The unipolar sigmoidal activation function serves as the membership function for each fuzzy set, similar to the trapezoidal/triangular membership functions found in fuzzy logic systems. It is important to note that the sigmoidal function may take any applicable form, and may be unipolar or bipolar as desired. The term Wij corresponds to the weight between the input node Xi and the hidden layer node Hj used in the generation of the rule j, and the appearance of the xe2x80x9cgreaterxe2x80x9d or xe2x80x9clesserxe2x80x9d term depends on whether Wij is positive or negative, respectively. The threshold Tj for a given rule rule j is equally partitioned between all of its antecedents. The consequents Cj are directly set to the weight values Zj (i.e., no linguistic label). The second feature concerns the manner in which antecedents of a rule are combined to form a fuzzy rule. In fuzzy logic-based systems, the antecedents are combined using the AND/OR operators. However, as discussed in the article xe2x80x9cThe Representation of Fuzzy Relational Production Rulesxe2x80x9d, Applied Intelligence, Vol. 1, Issue 1, p. 35-42 1991 by R. R. Yager, it has been proven that AND/OR operators are unsuitable for combining the antecedents of rules derived from an artificial neural network. Instead, a new fuzzy logic operator called the interactive-or (I-OR) operator has been derived. The I-OR of N input features (X1, . . . , XN) is of the form:
X1*X2* . . . *XN=X1X2. . . Xi/(X1X2. . . XN+(1xe2x88x92X1) . . . (1xe2x88x92XN)),
where the asterisks (*) represent the I-OR operation.
An I-OR between two input features X1 and X2 is characterized by the truth table shown in FIG. 2. The input features are in the range (0, 1) and the resulting I-OR of the two inputs is also between 0 and 1. The truth table shows the I-OR operator for specific examples of a two input case. Rows one to three from the top of the table demonstrate that when two inputs are biased to opposite extremes, the I-OR is indecisive (0.5). Rows four through seven from the top of the table show that if two inputs are biased in a fuzzy way towards one extreme, then the I-OR result is biased towards the more extreme of the two inputs.
An undesirable effect of rule extraction using the method developed by Benitez et al. is that the values of inputs in each of the antecedents are not constrained to be within the range of the input values that were used to train the neural network. For example, if the input Xi 112 in FIG. 1 was in the range of 0 to 140 during training of the neural network, the antecedent of a rule extracted could read something like xe2x80x9cIf {X1 is greater than approximately 156} . . . xe2x80x9d While it is arguable that the particular antecedent in question may be unimportant to the rule, it was observed that for several rules extracted, all of the antecedents exceeded the range used to train the neural network. Furthermore, the consequents are simply set to the numeric values based on weights that connect each rule to the output layer. This further reduces clarity of the rules. In order to make the rules meaningful, it is desirable to provide a means of interpretation for the rules, which constrains the antecedents to the range used to train the neural network, and which provides interpretable consequents.
Therefore, it is an object of the present invention to overcome the aforementioned limitations by providing a method for developing a readily interpretable and compact rule set which yields a high degree of accuracy for pattern recognition.
The present invention provides a method and an apparatus for automatically generating a fuzzy expert system for occupant sensing and recognition in fixed spaces such as vehicles, as well as fuzzy rule sets extracted therefrom. The expert system is derived in the form of fuzzy rules that are extracted from an artificial neural network. The artificial neural network is trained on data collected in an automobile using a multi-beam range profile sensor. The rules derived for the fuzzy expert system can explain the learned knowledge from the neural network in a comprehensible manner. Additionally, the developed rule set/base is compact in, size and has a prediction accuracy that is better than, or at worst, equal to the prediction accuracy of the neural network from which it was derived.
Specifically, the method present invention comprises the following steps: Providing a neural network having a latent variable space and an error rate, with the neural network further including a sigmoid activation function having an adjustable gain parameter xcex; iteratively adjusting the adjustable gain parameter xcex to minimize the error rate of the neural network, producing an estimated minimum gain parameter value xcexest; using the estimated minimum gain parameter value xcexest and a set of training data to train the neural network; and projecting the training data onto the latent variable space to generate output clusters having cluster membership levels and cluster centers, with the cluster membership levels being determined as a function of proximity with respect to the cluster centers.
The iterative adjustment of adjustable gain parameter xcex may be further defined by the sub-steps of:
i. providing a validation data set;
ii. setting an initial gain parameter value xcexinit, a current gain parameter value xcexcurr, a final gain parameter value xcexfinal, a gain incrementing value xcex94xcex, and an estimated minimum gain parameter value xcexest;
iii. setting the current gain parameter value xcexcurr equal to the initial gain parameter value xcexinit;
iv. setting the estimated minimum gain parameter value xcexest equal to the initial gain parameter value xcexinit;
v. training the neural network using the current gain parameter value xcexcurr to provide a trained neural network;
vi. inputting the validation data set into the trained neural network to generate an output data set;
vii. comparing the output data set generated by the trained neural network to the validation data set to determine the prediction error rate of the trained neural network;
viii. resetting the current gain parameter value xcexcurr equal to the current gain parameter value xcexcurr plus the gain incrementing value xcex94xcex;
ix. after each repetition of steps v through ix, setting the estimated minimum gain parameter value xcexest equal to whichever of the current value of the estimated minimum gain parameter value xcexest and the current gain parameter value xcexcurr generated a lesser prediction error rate; and
x. repeating steps v through ix of the until the current gain parameter value xcexcurr is equal to the final gain parameter value xcexfinal.
Additionally, the method of present invention may further include the step of fine-tuning the adjustable gain parameter xcex by performing, at least one repetition of the sub-steps of:
i. setting the initial gain parameter value xcexinit equal to the estimated minimum gain parameter value xcexest minus the gain incrementing value xcex94xcex;
ii. setting the final gain parameter value xcexfinal, equal to the estimated minimum gain parameter value xcexest plus the gain incrementing value xcex94xcex;
iii. generating a new gain incrementing value xcex94xcex, with the new gain incrementing value xcex94xcex being smaller than the previous gain incrementing value xcex94xcex;
iv. setting the current gain parameter value xcexcurr equal to the initial gain parameter value xcexinit; and
v. setting the estimated minimum gain parameter value xcexest equal to the initial gain parameter value xcexinit;
vi. training the neural network using the current gain parameter value xcexcurr to provide a trained neural network;
vii. inputting the validation data set into the trained neural network to generate an output data set;
viii. comparing the output data set generated by the trained neural network to the validation data set to determine the prediction error rate of the trained neural network;
ix. resetting the current gain parameter value xcexcurr equal to the current gain parameter value xcexcurr plus the gain incrementing value xcex94xcex;
x. after each repetition of steps v through ix, setting the estimated minimum gain parameter value xcexest equal to whichever of the current value of the estimated minimum gain parameter value xcexest and the current gain parameter value xcexcurr generated a lesser prediction error rate; and
xi. using the value of the estimated minimum gain parameter value xcexest resulting from the step of fine-tuning the adjustable gain parameter xcex for training the neural network.
Furthermore, the neural network may also include a plurality i of input nodes Xi for receiving inputs having a plurality N input features and a plurality j of hidden layer nodes Hj with each of the plurality j of hidden layer nodes Hj corresponding to one of a plurality j of rules, with one of a plurality j of rules including a plurality of antecedents A, and the sigmoid activation function f(x) is of the form:             f      ⁢              (        x        )              =          1              1        +                  ⅇ                                    -              λ                        ⁢                          xe2x80x83                        ⁢                          W              ij                        ⁢                          X              i                                            ,
where xcex represents the adjustable gain parameter; Wij represents the weight between the plurality i of input nodes Xi, and a pluralityj of hidden layer nodes Hj; and where each of the plurality of antecedents A of each one of the plurality j of rules is of the form:       A    =          2.2              N        ⁢                  xe2x80x83                ⁢                  λ          est                ⁢                  W          ij                      ,
where N represents the input features of the inputs i; xcexest represents the estimated minimum gain parameter value; and Wij represents the weight between the plurality i of input nodes Xi, and a pluralityj of hidden layer nodes Hj. Linguistic labels may additionally be provided for the clusters and cluster membership levels.
Also, the sigmoid activation function of the neural network provided may be further defined as including an adjustable bias threshold Tj, which is iteratively adjusted to minimize the error rate of the neural network, producing an estimated minimum bias threshold Tj,est; and the estimated minimum bias parameter value Tj,est may used, along with the estimated minimum gain parameter value xcexest, to train the neural network. Steps similar to those described by i through x and a through f above may be used to adjust the adjustable bias threshold Tj to find the estimated minimum bias parameter value Tj,est. In order to take into account the adjustable bias threshold Tj, the sigmoid activation function f(x) may take the form:             f      ⁢              (        x        )              =          1              1        +                  ⅇ                                                    -                λ                            ⁢                              xe2x80x83                            ⁢                              W                ij                            ⁢                              X                i                                      +                          T              j                                            ,
where xcex represents the adjustable gain parameter, Wij represents the weight between the plurality i of input nodes Xi, and a pluralityj of hidden layer nodes Hj; and where Tj represents the adjustable bias threshold; and where each of the plurality of antecedents A of each rule is of the form:       A    =                  2.2        -                  T                      j            ,            est                                      N        ⁢                  xe2x80x83                ⁢                  λ          est                ⁢                  W          ij                      ,
where Tj,est represents the adjustable bias threshold, where N represents the input features of the inputs; xcexest represents the estimated minimum gain parameter value xcexest; and Wij represents the weight between the plurality i of input nodes Xi, and a plurality j of hidden layer nodes Hj.
The system, or apparatus, of the present invention includes a neural network having a latent variable space and an error rate, with the neural network further including a sigmoid activation function having an adjustable gain parameter xcex, with the gain parameter xcex iteratively adjusted to minimize the error rate of the neural network, and to produce an estimated minimum gain parameter value xcexest; a set of training data used, along with the estimated minimum gain parameter value xcexest, to train the neural network; and output clusters generated by projection of the training data set onto the latent variable space of the neural network, each of said output clusters having cluster membership levels and cluster centers with the cluster membership levels determined as a function of proximity with respect to the cluster centers. Linguistic labels may be applied to the output clusters and cluster membership levels. Additionally, the sigmoid activation function of the neural network may further include an adjustable bias threshold Tj, with the adjustable bias threshold Tj iteratively adjusted to minimize the error rate of the neural network, and to produce an estimated minimum bias threshold Tj,est, and wherein the training data set is used, along with the estimated minimum bias threshold Tj,est and the estimated minimum gain parameter value xcexest, to train the neural network.