Various automatic target recognition (ATR) systems have been designed to obtain accurate predictions from target recognition results based on imagery obtained by one or more sensors. Such systems generally attempt to predict a target type from a set of target types based on sensor data and/or fused data (e.g., data from multiple sensors and/or multiple looks).
To classify the target type from a set of target types, at least one image or dataset must be processed and data must be extracted. Depending on the system requirements and/or parameters, the data may include confidence values (e.g., a percentage, estimate or likelihood) and a corresponding pose for each of the confidence values.
These systems have utilized decision level, hypothesis level and feature level fusion in attempts to determine the best ATR evidence to be fused. Decision level fusion ATR systems try to arrive at a classification decision by combining decisions from two or more ATR systems. There is an inevitable loss of information in the process of making decisions prior to fusion. Some decision level systems may determine the best ATR evidence over a range of azimuth angles for each look, but ignore consistency or relationship criteria between pose information, again with a consequent loss of information due to making decision prior to fusion. Moreover, such systems fuse ATR scores after significant portions of the data is pruned by the individual ATR systems prior to fusion. These ATR systems may prune portions of the data for reasons of efficiency, or a specific ATR system may be unable to provide accurate predictive data without pruning the data. For example, the ATR system may be designed to combine specific portions of data, but not other portions of data, such that specific portions of data are required for that specific ATR system. The ATR system may also be designed such that data must be pruned prior to fusion if the data does not have sufficient information (e.g., points of data).
Furthermore, accumulation of evidence is a long standing and much addressed area in pattern recognition. Typically, in pattern recognition, the evidence is represented as scores, usually real valued, from pattern classifiers for each possible object or category type. Bayesian methods seek to represent degrees of belief about the state of the world based on both prior knowledge and measurements. Other methods (e.g. frequentist) may not include prior probabilities in their approaches. Methods of adding evidence over time include likelihood based methods that sequentially add log likelihoods or log likelihood ratios. A known method is the Sequential Probability Ratio Test which adds log likelihood ratios and tests against a threshold. Centralized fusion methods can add weighted decisions using weights that are derived from the expected performance of the two or more classification system, with weights being defined in terms of the relative odds of the probability of correct decisions for true and false decisions. Other methods include conditional estimates of probability densities such as Markov models. However, most of these methods ignore all but the second order information contained in the variance and then propagate densities based on second order models; the Kalman filter is a well known example. Voting methods are also used for accumulation of evidence in pattern recognition. However, these above known methods simply add votes or evidence based on preliminary classifier results.
To evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated in the light of new, relevant data. The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation. Bayesian probability interprets the concept of probability as “a probability p is an abstract concept, a quantity that the invention assigns theoretically, for the purpose of representing a state of knowledge, or that the invention calculates from previously assigned probabilities,” in contrast to interpreting it as a frequency or a “propensity” of some phenomenon. In the Bayesian interpretation, Bayes' theorem expresses how a subjective degree of belief should rationally change to account for evidence.
The Binomial probability density describes the probability of observing a specific number of each of two mutually exclusive outcomes as the result of an experiment or set of observations. The Multinomial theorem may be considered as the generalization of the binomial theorem to more than two possible mutually exclusive outcomes of a number of observations or experiments. The Multinomial density reduces to the binomial density in the case of exactly two possible types of outcomes or “categories”. The Multinomial density assumes specific probabilities of each of the possible categories as input parameters to the density. It does not revise these probabilities when operating alone.
The Dirichlet distribution typically denoted as DIR (α) is a family of continuous multivariate probability distributions parametrized by a vector α of positive real values. It is the multivariate generalization of the beta distribution. Dirichlet distributions may be used as prior distributions in Bayesian statistics to the Multinomial distribution, and in such cases the result is also a posterior Dirichlet distribution, that is, the Dirichlet distribution is the conjugate prior of the categorical distribution (a multinomial with a single observation) and multinomial distribution. In other words, the probability density function of a Dirichlet distribution returns the belief that the probabilities of K rival events are Xi, given that each event has been observed αi times, by taking the relative proportion of each αi to their total.