The present invention relates to neural networks used for pattern recognition. More particularly, the invention disclosed herein provides a highly accurate classification method and apparatus for position and recognition sensing which uses a boosting and pruning approach for adaptive resonance theory (ART) based neural networks. The present invention is chiefly applicable to pattern recognition problems such as automobile occupant type and position recognition and hand signal recognition.
In the recent past, research has been applied to the use of artificial neural networks (ANN) as a nonparametric regression tool for function approximation of noisy mappings. ANNs have been successfully applied in a large variety of function approximation applications including pattern recognition, adaptive signal processing, and the control of highly nonlinear dynamic systems. In pattern recognition applications, ANNs are used to construct pattern classifiers that are capable of separating patterns into distinct classes. In signal processing and control applications ANNs are used to build a model of physical system based on data in the form of examples that characterize the behavior of the system. In this case, the ANN is essentially used as a tool to extract the mapping between the inputs and outputs of the system without making assumptions about its functional form.
The most common type of ANN used in function approximation problems is the feedforward type. Although these networks have been successfully used in various applications, their performance is dependent on a problem-specific crafting of network architecture (e.g. the number of hidden layers and the number of nodes in each hidden layer) and network parameters (e.g. learning rate). These networks operate in a batch-processing mode (or an off-line mode), where the entire training data are presented in training epochs until the mean square energy of the network is minimized to a user-defined level by adjusting the weights of the network. These weight adjustments (or learning) are typically based on some form of gradient descent and are prone to be stuck in local minima. Thus, there is no guarantee of network convergence to the desired solution. Further, once the network has been trained, the only way to accommodate new training data is to retrain the network with the old and new training data combined.
Adaptive resonance architectures are neural networks that self-organize stable recognition categories in real time in response to arbitrary sequences of input patterns. The basic principles of adaptive resonance theory (ART) were introduced in Grossberg, xe2x80x9cAdaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction, and illusions.xe2x80x9d Biological Cybernetics 23 (1976) 187-202. A class of adaptive resonance architectures has since been characterized as a system of ordinary differential equations by Carpenter and Grossberg, xe2x80x9cCategory learning and adaptive pattern recognition: A neural network model, Proceeding of the Third Army Conference on Applied Mathematics and Computing, ARO Report 86-1 (1985) 37-56, and xe2x80x9cA massively parallel architecture for a self-organizing neural pattern recognition machine.xe2x80x9d Computer Vision, Graphics, and Image Processing, 37 (1987) 54-1 15. One implementation of an ART system is presented in U.S. application Ser. No. PCT/US86/02553, filed Nov. 26, 1986 by Carpenter and Grossberg for xe2x80x9cPattern Recognition Systemxe2x80x9d.
More recently, a novel neural network called the fuzzy ARTMAP that is capable of incremental approximation of nonlinear functions was proposed by Carpenter et al. in G. A. Carpenter and S. Grossberg, xe2x80x9cA massively parallel architecture for a self-organizing neural pattern recognition machine,xe2x80x9d Computer Vision Graphics, Image Process., Vol. 37, pp. 54-115, 1987. The number of nodes in this network is recruited in a dynamic and automatic fashion depending on the complexity of the function. Further, the network guarantees stable convergence and can learn new training data without the need for retraining on previously presented data. While the fuzzy ARTMAP and its variants have performed very well for classification problems, as well as extraction of rules from large databases, they do not perform very well for function approximation tasks in highly noisy conditions. This problem was addressed by Marriott and Harrison in S. Marriott and R. F. Harrison, xe2x80x9cA modified fuzzy artmap architecture for the approximation of noisy mappings,xe2x80x9d Neural Networks, Vol. 8, pp. 619-41, 1995, by designing a new variant of the fuzzy ARTMAP called the PROBART to handle incremental function approximation problems under noisy conditions. The PROBART retains all of the desirable properties of fuzzy ARTMAP but requires fewer nodes to approximate functions.
Another desirable property of the ANN is its ability to generalize to previously untrained data. While the PROBART network is capable of incremental function approximation under noisy conditions, it does not generalize very well to previously untrained data. The PROBART network has been modified by Srinivasa in N. Srinivasa, xe2x80x9cLearning and Generalization of Noisy Mappings Using a Modified PROBART Neural Network,xe2x80x9d IEEE Transactions on Signal Processing, Vol. 45, No. 10, October 1997, pp. 2533-2550, to achieve a reliable generalization capability. The modified PROBART (M-PROBART) considerably improved the prediction accuracy of the PROBART network on previously untrained data even for highly noisy function approximation tasks. Furthermore, the M-PROBART allows for a relatively small number of training samples to approximate a given mapping, thereby improving the learning speed.
The modified probability adaptive resonance theory (M-PROBART) neural network algorithm is a variant of the Fuzzy ARTMAP, and was developed to overcome the deficiency of incrementally approximating nonlinear functions under noisy conditions. The M-PROBART neural network is a variant of the adaptive resonance theory (ART) network concept, and consists of two clustering networks connected by an associative learning network. The basic M-PROBART structure is shown in FIG. 1. For any given input-output data pair, the first clustering network 100 clusters the input features, shown in the figure as an input feature space 102 having N features, in the form of hyper-rectangles. The vertices of the hyper-rectangle are defined by the values of the input features and the dimensions of the hyper rectangle are equal to the number of input features. The size of the hyper-rectangle is defined based on the outlier members for each cluster. The corresponding output, shown in the figure as an output feature space 104 having M features, is also clustered by the second clustering network 106 in the form of a hyper-rectangle. An associative learning network 108 then correlates these clusters. The clustering networks 100 and 106 are represented by a series of nodes 110. In the original Fuzzy ARTMAP network, only many-to-one functional mappings were allowed. This implies that many hyper-rectangles that form input clusters could be associated with a single hyper-rectangle on the output side but not the other way around. Further, for any given input, only one cluster (i.e., the maximally active or the best match cluster) was allowed to be active on the input side and a prediction was based on the associated output cluster for that maximally active input cluster. This mode of prediction is called the winner-take-all (WTA) mode of prediction. It has been shown that by replacing the WTA mode of prediction with a distributed mode of prediction combined with allowing one-to-many mappings between the input and output clusters, the M-PROBART was capable of better prediction capabilities than Fuzzy ARTMAP under noisy conditions.
The associative learning network in the M-PROBART has the simple function of counting the frequency of co-occurrence of an input and output cluster. Thus, if an input cluster is very frequently co-active with an output cluster, then the frequency of their association (or the connection between the two clusters) is increased by the associative network to reflect the importance of the association. During prediction, each test input activates several clusters in the input clustering network with activity proportional to the degree of match between the input and each cluster center. This forms a distributed pattern of activity at the input clustering network. This activity is weighted by the strength (or frequency) of its association to a given output cluster to arrive at the most probable output cluster prediction. Another interesting aspect of the M-PROBART algorithm is that each association between an input cluster and output cluster can be directly interpreted as a rule. The firing strengths for each rule is provided by the product of the cluster activity and the frequency of association between the input and output cluster of that rule. While the M-PROBART algorithm is able to outperform the Fuzzy ARTMAP algorithm for both functional approximation and classification tasks, much like the Fuzzy ARTMAP algorithm it has a key drawback in that for high prediction accuracy requirements, the number of input and output clusters formed becomes prohibitively large (order of thousands of rules). Thus, it is impractical to implement for real world problems.
With the increasing functionality of neural networks, the number and variety of applications to which they are applied is also expanding. Neural networks may be applied to pattern recognition applications such as character recognition, speech recognition, remote sensing, automotive occupant sensing, recognition of an object via physical feature sensing, and medical analysis, to name a few. For each of these applications, classification algorithms are available based on different theories and methodologies used in the particular area. In applying a classifier to a specific problem, varying degrees of success with any one of the classifiers may be obtained. To improve the accuracy and success of the classification results, different techniques for combining classifiers have been studied. Nevertheless, problems of obtaining high classification accuracy within a reasonable time exist for the present classifier combination techniques and an optimal integration of different types of information is therefore desired to achieve high success and efficiency. This need is particularly strong in situations that, by their nature, require a high degree of accuracy and a fast classification response, such as automobile safety systems.
To generate a faster, more accurate classification system, combinations of multiple classifiers have been employed in a technique known as boosting. The boosting technique essentially converts a neural network with non-zero error rate into an ensemble of neural networks with significantly lower error rate compared to a single neural network. In early combination techniques, a variety of complementary classifiers were developed and the results of each individual classifier were analyzed by three basic approaches. One approach uses a majority voting principle, where each individual classifier represents a score that may be assigned to one label or divided into several labels. Thereafter, the label receiving the highest total score is taken as the final result. A second approach uses a candidate subset combining and re-ranking approach, where each individual classifier produces a subset of ranked candidate labels, and the labels and the union of all subsets are re-ranked based on their old ranks in each subset. A third approach uses Dempster-Shafer (D-S) theory to combine several individual classifiers. However, none of these approaches achieve the desired accuracy and efficiency in obtaining the combined classification result.
Therefore, it is an object of the present invention to provide a two-stage boosting and pruning approach to both reduce the number of rules formed by the M-PROBART and also considerably improve its prediction accuracy.
References of interest relative to M-PROBART, Fuzzy Artmap, and boosting of neural networks include the following:
1) N. Srinivasa, xe2x80x9cLearning and Generalization of Noisy Mappings Using a Modified PROBART Neural Networkxe2x80x9d, IEEE Transactions on Signal Processing, vol. 45, no. 10, pp. 2533-2550, October 1997;
2) G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds and D. B. Rosen, xe2x80x9cFuzzy Artmap: A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Mapsxe2x80x9d, IEEE Transactions on Neural Networks, vol. 3, pp. 698-712, 1992.;
3) H. Drucker, R. Schapire and P. Simard, xe2x80x9cBoosting Performance in Neural Networks,xe2x80x9d International Journal of Pattern Recognition and Artificial Intelligence, vol. 7, no. 4, pp. 705-719, 1993.
In accordance with the present invention, a neural network boosting and pruning method is presented, which improves the accuracy of adaptive resonance theory (ART) based networks. A set of training data having inputs with correct classifications corresponding to the inputs is provided. Next, the data is ordered into a plurality of differently ordered data sets, and each of the differently ordered data sets are divided into a plurality of data portions. A plurality of booster networks Bx,y is then associated with each of the plurality of differently ordered data sets with x representing a particular booster type and y representing the particular data set with which the particular booster is associated. The first booster network in each data set is then trained using one of the data portions. After training, the first booster network is tested using the data in another data set. Next, a series of booster networks are trained and tested, with each subsequent booster network receiving the mistakes of the previous booster network along with a portion of the correct decisions made by all of the previous booster networks. The number of booster networks utilized depends on the particular application and the necessary classification accuracy. The rules from all of the booster networks of a particular booster type x are then pruned in an intra-booster pruning process, where rules having a sufficient overlap with other rules are eliminated, resulting in a series of intra-booster pruned networks. The rules from intra-booster pruned networks are then pruned in an inter-booster pruning process, similar in operation to the intra-booster pruning process, resulting in a single residual booster. The present invention is preferably embodied using M-PROBART-type booster networks, and utilizes a pair-wise Fuzzy AND operator in the intra and inter-booster pruning processes to eliminate rules having sufficient overlap.