1. Field of the Invention
The invention relates in general to classification of objects based upon information from multiple information sources and more particularly, to incremental data fusion and decision making based upon multiple information sources that become available incrementally.
2. Description of the Related Art
Consider the scenario in which a number of surveillance modules are available (e.g., face recognition, finger print, DNA profile). These modules have varying accuracy, speed, and cost of deployment. They provide different or similar “views” of the same surveillance subject, and hence, information provided from different sources may be correlated or uncorrelated, corroborating or contradicting, etc. The challenge is to use these sources in a judicious and synergistic manner to aid decision making. Similar application scenarios arise, for example, in determining credit worthiness of a loan or credit card applicant by examining multiple indicators, such as income, debt, profession, etc., and in detecting network intrusion by examining network traffic, IP addresses, port numbers, etc.
Two problems can arise when considering data fusion and decision making involving addition of a new data source and associated classifier to a decision process. A first problem involves determining whether it is even desirable to select an additional information source, and if so, which information source to use. A second problem involves determining the optimal way to incorporate the new information to update a current decision process. These problems, on the first glance, might appear to be a straightforward application in designing an ensemble classifier or a committee machine from component classifiers. Prior techniques, such as arching, bagging, and boosting, should be applicable. However, such prior techniques do not adequately address these problems.
Generally speaking, decision fusion frameworks differ in two important aspects: the composition of the training data and the mechanism of decision reporting. They can be roughly put in two categories: those based on decision fusion and those based on measurement (or data) fusion. As used herein, a classifier is a function module that has been trained to determine output classification prediction information based upon a prescribed set of input information. An ensemble classifier comprises multiple component classifiers.
FIG. 1 is an illustrative diagram of a prior decision fusion classifier 10. The process involves use of multiple classifiers Cdf1-Cdfn and multiple associated sets of measurement data Mdf1-Mdfn. Each classifier performs a corresponding classification process producing a classification result, using its associated measurement data. For example, classifier Cdf1 performs classification process 12 based upon measurement data Mdf1, and produces a classification result Rdf1. Similarly, classifiers Cdf2-Cdfn perform corresponding classification processes 14-18 to produce classification results Rdf2-Rdfn using corresponding measurement data Mdf2-Mdfn. A decision Ddf is “fused” based upon the results Rdf1-Rdfn.
When component classifiers Cdf1-Cdfn are trained on different measurement data Mdf1-Mdfn (e.g., some use fingerprints and some use face images), individual component classifiers are trained and consulted more or less independently. Information fusion often times occurs at the decision level (hence the term decision fusion). If for a given test subject, a component classifier (e.g., Cdf1) reports only a result (e.g., Rdf1) (a class label) with no supporting evidence, the best one can hopeful is some kind of majority rule. On the other hand, if the classifier reports a result with some associated confidence measure, then an aggregate decision Ddf can be some weighted combination of individual decisions (e.g., sum, max, min, median, and majority). It has been reported, somewhat surprisingly, that the simple sum rule outperforms many others in real-world experiments.
Unfortunately, simple decision fusion schemes typically do not address the problems identified above that can arise when not all component classifiers and not all data sets are made available at the same time. For example, gathering data from all possible sources can be a very expensive process—both in terms of time and cost. Often times, a decision of high confidence can be reached without consulting all information sources. Hence, it may be desirable to add a new additional data source and associated classifier to the decision process only when needed. The problems discussed above are whether to add a new classifier and new measurement data, and if so, which ones to add, and how to modify the decision process to accommodate the results produced by the new classifier.
While decision fusion schemes win for their simplicity, they represent a greedy, component-wise optimization strategy. More recently, other ensemble classifiers (such as bagging and boosting) that are based on measurement fusion principles and use a joint optimization strategy have become more popular. Intuitively speaking, a measurement fusion scheme has all component classifiers trained on data sets that comprise all available measurements to better exploit their joint statistical dependency. This represents a more global view of the measurement space, and hence, motivates the use of the term measurement fusion.
As a more concrete example of a measurement fusion scenario, denote a surveillance module (e.g., fingerprint) as s, and denote its measurement as a multi-dimensional random variable xs and its class assignment (e.g., a known employee who should be allowed access or an intruder) over some sample space as random variable ys. Furthermore, denote yt as the ground truth (correct) label, which may or may not be the same as ys. Traditional ensemble classifiers, such as ADAboost, operate in a batch mode by concatenating all source data into a big aggregate (X, yt), X=(x0,x1, . . . , xk−1), where k sources, s0,s1, . . . , sk−1, are used and xi is the measurement from the ith source. Hence, if n training data points are available, then training data set is represented as D=({(X0,yt0),(X1,yt1), . . . ,(Xn−1,ytn−1)}, where Xi=(xo,i,x1,i, . . . , xk−1,i). Component classifiers are then trained on this aggregate representation.
Different measurement fusion training strategies can be used, but traditionally, they all involve altering the training datasets of the component classifiers in some principled way. For example, bagging partitions the n data points into subsets and gives a different subset to each component classifier for training. The ensemble label is then assigned based on a simple voting scheme. This represents a naive “parallel processing” approach to recognition and employs classifiers in a batch mode. Known boosting techniques such as, Boosting-by-filtering and ADAboost (Freund, Y. & Schapire, R. E. (1995), A decision-theoretic generalization of on-line learning and an application to boosting, in Proceedings of the 2nd European Conference on Computational Learning Theory (Eurocolt95), Barcelona, Spain, pp. 23-37), for example, adopt a strategy of employing component classifiers incrementally and only when needed. The composition of the training data set for later component classifiers is altered based on how difficult to classify a data point using the ensemble classifier constructed so far (data that are easier to classify have a smaller chance of being selected for future training).
FIG. 2 is an illustrative diagram of a prior measurement fusion classifier 20 using a “bagging” technique. In general terms, during a bagging classifier process, measurement data from multiple different sources 22-28 is classified using multiple different classifiers Cbag1-Cbagn. Unlike decision fusion, however, individual classifiers are not necessarily associated with measurement data from individual sources 22-28. Rather, the totality of the measurement data 30 is separated into separate data “bags” Mbag1-Mbagn, and each classifier Cbag1-Cbagn is trained on its associated “bag” of data. This association of different classifiers with different bags of data is represented by the different cross-hatching in the arrows and corresponding cross-hatching of regions of the measurement data 22. For example, classifier Cbag1 classifies using data bag Mbag1; classifier Cbag2 classifies using data bag Mbag2; and classifier Cbagn classifies using data bag Mbagn. A decision is reached based upon results Rbag1-Rbagn produced by classifiers Cbag1-Cbagn.
FIG. 3 is an illustrative diagram of a prior measurement fusion classifier 30 using a “boosting” technique. In this example boosting process, boosting classifiers Cboost1-Cboostn are trained on filtered measurement data. To understand the purpose of a boosting process, consider the following scenario. The decision to be made is to decide whether or not to issue a credit card to an applicant. Classifiers Cboost1-Cboost3 are trained to make a recommendation for any given applicant. Assume, for example, that ten million data sources 30-1 to 30-10M, each representing a credit card candidate, are considered during training of the classifiers. Assume that classifier Cboost1 is trained using all of the source data 32, to identify the 500,000 best and 500,000 worst candidates. Assume that a first filtered data set 34 is produced by removing (or making it less likely to select again for training) these one million data sources. In this example, classifier Cboost2 is trained using only the remaining nine million data sources 36-1 to 36-9M, to identify the next 1,000,000 best and the next 1,000,000 worst candidates. Assume that a second filtered data set 38 is produced by removing (or making it less likely to select again for training) these next two million data sources. In this example, classifier Cboost3 is trained using only the remaining seven million data sources 40-1 to 40-7M. Each of the boost classifiers Cboost1-Cboost3 produces a corresponding classification result Rboost1-Rboost3 in response to measurement data from a subject, such as a credit card applicant. A decision Dboost is arrived at based upon the classification results.
While classification involving bagging or boosting generally have been successful, there have been shortcomings with their use. For example, boosting techniques require fine-tuning of the parameters of the component classifiers, which is an expensive in terms of processing effort and involves joint optimization of all component classifiers. This often is not feasible due to the cost and time constraints in surveillance applications, even when the training process is performed off-line. Furthermore, regardless of how data are filtered (as in boosting) or partitioned (as in bagging), data from all sources are assumed available in a batch for training, which is not the scenario considered by this particular application, where data are acquired incrementally and sources are queried as needed.
Decision-tree methods provide another decision making approach. These methods operate in a similar way to boosting and bagging by iteratively partitioning a measurement space into homogeneous cells. The partition is recorded in a tree structure. FIG. 4 is an illustrative drawing showing cells of a recorded tree structure in accordance with a prior decision-tree technique. During a decision-tree process using such a tree-structure, an unknown sample is filtered down the decision tree and assumes the label of the majority training samples in the particular leave cell the unknown sample happens to end up with. In the illustrated example of FIG. 4, the decision-tree is used for binary (yes/no) decisions. Samples that fall outside a cell, indicated as R1 are labeled “no”, and samples that fall inside a cell, indicated as R2 are labeled “yes”. Unfortunately, like bagging and boosting, a decision tree operates on the complete measurement space and does not consider the case where measurements are made available incrementally.
Thus, typical prior classification schemes are incremental either in the classifier dimension or the data dimension, but not both. Decision fusion schemes ordinarily partition data into subsets but aggregate information in a batch manner, with or without proper weighting. Measurement fusion schemes ordinarily introduce classifiers one at a time, but the data they train on are nonetheless “complete.” That is, regardless of whether the global composition of the training set changes as a result of the training done so far, each training point in the set comprises measurements from all surveillance modules.
However, in reality, surveillance modules are employed incrementally (e.g., only when the face recognition module detects a suspicious person will that person be asked to undergo a fingerprint exam). Hence, decisions are made when only partial data are available. Furthermore, a current decision may be used not only for subject discrimination, but also for selecting new sources to query in an incremental manner. This raises doubt as to whether training results, derived based on the assumption of the availability of all decisions or all data, are applicable when only partial decisions or partial data are available for decision making.
Thus, there has been a need for an improvement incremental data fusion and decision making. The present invention meets this need.