1. Field of Invention
This invention relates generally to a system and method for correlating two images for the purpose of identifying a target in an image where templates are provided a priori only for the target. Information on the other objects in the image being searched may be unavailable or difficult to obtain. This invention treats the problem of designing target matching-templates and target matched-filters for image correlation as a statistical pattern recognition problem. By minimizing a suitable criterion, a target matching-template or a target matched-filter is estimated which approximates the optimal Bayes discriminant function in a least-squares sense. When applied to an image, both Bayesian image correlation methods are capable of identifying the target in a search image with minimum probability of error while requiring no a priori knowledge of other objects that may exist in the image being searched. The system and method is adaptive in the sense that it can be readily re-optimizing (adapting) to provide optimal discrimination between the target and any unknown objects which may exist in a new search image, using only information from the new image being searched.
2. Prior Art—FIGS. 1, 2, 3, 4, and 5
Image correlation is the process of comparing a sensor image and a reference image in order to identify the presences and location of the reference image in the sensor image. Accurate identification of the presences of a reference image in another image and accurate and unambiguous measurement of its location is important in many practical applications.
Image correlation is used in a number of military applications, such as guiding a missile to a pre-selected target. It is also used in medical image registration, robotics, automated manufacturing, GIS, Home Land Security/Law Enforcement (fingerprint recognition, face recognition, iris recognition), and content-based image retrieval. Registration of multi-spectral images, acquired from earth-observing satellites, is another important application.
The basic elements of image correlation are shown in FIG. 1. Referencing FIG. 1, it can be seen that the inputs to the Correlator 10 are the Search Image and a Target Reference Image. The outputs from the Correlator 10 are the Target Locations in the Search Image and an estimate of the Probability of a Correct Fix.
A number of techniques for image correlation have been reported in the literature, the simplest technique being various forms of the spatial template-matching algorithm [R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973, pp.278], [Ormsby, C. C.; “Advanced scene Matching Techniques,” Proc. IEEE Nat'l Aerospace and Electronic Conf., Dayton, Ohio pp. 68-76, May 15-17, 1979], and [Pratt, W. K,; Digital Image Processing, John Wiley, 1978, pp. 552].
The analog of the spatial template-matching algorithm in the spatial frequency domain is the matched-filter algorithm [R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973, p.307] and [Pratt, W. K,; Digital Image Processing, John Wiley, 1978, pp. 553-560]. Frequency domain matched-filter has the desirable characteristic in that it can be implemented using the fast Fourier transform (FFT) algorithm.
A number of more sophisticated correlation techniques based on matched-filtering have been developed in recent years. These include the phase correlation method [Chen, Q.; Defrise, M.; and Deconinck, F.: “Symmetric Phase-Only Matched Filtering of Fourier-Mellin Transforms for Image Registration and Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 12, Dec. 1994], [Pearson, J. J.; Hines, D. c., Jr.; Golosman, S.; and Kuglin, C. D.: “Video-Rate Image Correlation Processor,” SPIE, vol. 119, Application of Digital Image Processing, pp. 197-205, IOCC 1977], modeling images using feature vectors [Chen, X.; Cham, T.: “Discriminative Distance Measures for Image Matching”, Proceedings of the International Conference on Pattern Recognition (ICPR), Cambridge, England, vol. 3, 691-695, 2004], and optimum filters [Steding, T. L.; and Smith, F. W.: “Optimum Filters for Image registration.” IEEE Trans. on Aerospace and Electronic Systems, vol. ASE-15, no. 6, pp. 849-860, November 1979].
Belsher, Williams, and Kin [Belsher, J. F.; Williams, H. F.; Kin, R. H.: “Scene Matching With Feature Detection,” SPIE, vol. 186, Digital Process of Aerial Images, 1979] were able to treat template-matching as a classical pattern recognition problem by assuming independence between adjacent picture elements (pixels) in the image and by assuming the form of the probability density function to be Gaussian, Laplacian, or Cauchy. Minter [Minter, T. C., “Minimum Bayes risk image correlation”, SPIE, vol. 238, Image Processing for Missile Guidance, pp. 200-208, 1980] was also able to treat matched-filtering as a classical pattern recognition problem by assuming independence between adjacent picture elements (pixels) in the image but without assuming a form for the probability density function. However, the assumption of independence between pixels elements in the image reduces the effectiveness of these image correlation approaches since pixel element correlation is a potentially important source of information in target recognition.
The performance of image correlation techniques is usually characterized in terms of the probability of a false fix on the target (or alternatively, the probability of a correct fix on the target) and registration accuracy. The probability of a correct fix on the target refers to the probability that the correlation peak truly identifies a target in the search image. A related quantity is the peak-to-side-lobe ratio (PSR), which is defined as the value of the match-point correlation peak divided by the root mean square (RMS) value of the correlation function at points excluding this peak. Once the correct peak has been identified, the accuracy with which the true position of the peak is located can be quantified by the variance of the correlation peak location. Webber and Deloshmit [Webber, R. F.; and Deloshmit, W. H.: “Product Correlation Performance for Gaussian Random Scenes,” IEEE Trans. on Aerospace and Electronics Systems, vol. AES-10, no. 4, pp. 516-520, July 1974] have shown that the probability of a false fix is a monotonically decreasing function of the PSR.
This invention treats the problem of designing target matching-templates and target matched-filters for image correlation as a statistical pattern recognition problem. It is shown that by minimizing a suitable criterion, a target matching-template or a target matched-filter can be estimated which approximates the optimum Bayes discriminant function in a least-squares sense. It is well known that the use of the Bayes discriminant function in target classification minimizes the probability of a false fix [R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973, pp.11].
However, the use of pattern recognition techniques in image correlation application presents several unique problems. The target's characteristics are usually well known and an appropriate reference image (or images) is available which provides an appropriate statistical representation of the target. However, often little prior information is available on the statistical characteristics of the other objects (or classes) present in the image which might be confused with the target. This lack of information about the confusion objects (or classes) presents a serious problem in designing an optimal discriminant procedure for recognizing the target.
Most of the literature on Bayes discriminant procedures is restricted to the two-class problem where class-conditional probability distributions functions are either available for each class (or object) or can be estimated from labeled training samples available for each class. Below, the Bayes decision rule is reformulated to make it suitable for use in the unique environment associated with target discriminant. The reformulated decision rule only requires an estimate of the target's posterior probability function. It will be shown that the target's posterior probability function can be estimated using labeled training samples from the target and unlabeled samples from the image being searched. Its application to Bayes image correlation will be discussed.
It will also be shown that the proposed image correlator is adaptive in the sense that it is capable of adapting both the target matching-template and the target matched-filter to provide optimal discrimination between the target and any unknown objects which may exist in the image being searched. If a new search image, with a possibly different set of unknown objects is to be searched, the target matching-template and the target matched-filter can be readily re-optimized (adapted), for the new set of unknown objects before searching the new image. Adaptation is accomplished using only unlabeled patterns from the new image being searched.
Creating Vector Representations of Target Templates and the Search Image
In target discrimination, it is assumed that two classes of objects are in the search image—target and not-target objects. Let Ct be the class (or object) label for target and Cnt be the class (or object) label for not-target. The problem of classification arises when an event described by a set of measurements is to be associated with these two classes. The event cannot be identified directly with a class, so a decision must be made on the basis of the set of observed measurements as to which class the event belongs (i.e., target or not-target). The set of measurements can be represented as a vector in the measurement space. This measurement will be called the measurement vector, or simply a sample, and will be denoted as X=(x1,x2, . . . , xd)T, where the T denotes the transpose and d is the number of measurements or the dimensionality of the measurement space.
In the context of target identification in images:
Let
S(y,z)—be the image to be searched
Tj(y,z)—be a target template
D—be the domain of definition of the target template
(D, for example, might be 20×20 pixels square, whereas the search image might be 500×500 pixels square.)
Labeled measurement vectors for the target can be formed in the following manner: As shown in FIG. 2, target reference images T1(y,z) 16 thru TKt(y,z) 18, contain examples of the target, which are M×M pixels in size. It is assumed, for the sake of generality, that Kt reference images, Tj(y,z), j=1, 2, . . . , Kt, are available, each containing an example of the target. A target measurement vector Xt(j) is constructed from Tj(y,z) by stacking the M columns of Tj(y,z). The first M elements of the target measurement vector Xt(j) are the elements of the first column of Tj(y,z), the next M elements of Xt(j) are from the second column of Tj(y,z), and so forth, for all M columns of Tj(y,z). The dimension, d, of Xt(j) is d=M2. This procedure produces Kt measurement vectors, Xt(j), j=1, 2, . . . , Kt, one for each target template.
Unlabeled measurement vectors from the search image S(y,z) can be formed in a similar manner. Again, referencing FIG. 2, the image to be searched, S(y,z) 12, is considered for simplicity, to be N×N in size where N>M. We select a M×M dimension sub-area S(y−u,z−v) 14 which is equal to the target template in size. As shown in FIG. 2, the center of sub-area S(y−u,z−v) is located at (u,v) 15. An unlabeled measurement vector, X(i), is constructed from the M×M sub-area S(y−u,z−v) 14 by stacking the columns of the M×M sub-area. The dimensional, d, of the unlabeled measurement vector, X(i), is d=M2. The search image is systematically scanned by incrementing u and v. The index i for X(i) is indexed to u and v. From the search image, K unlabeled measurement vectors, X(i), i=1, 2, . . . , K are constructed where K=(N−M+1)2.
The Adaptive Bayes Approach to Pattern Recognition
Bayes decision theory is a fundamental approach to the problem of pattern recognition [R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973, pp. 11-17]. The approach is based on the assumption that the decision problem can be poised in probabilistic terms where all of the relevant probability values are known. Specifically, the application of a standard Bayes classifier requires estimation of the posterior probabilities of each class. If information about the probability distributions of classes is available, the posterior probability can be calculated for each measurement and each measurement can be attributed to the class with the highest posterior probability.
However, this traditional approach is not feasible when unknown “not-target” classes are present in the search image. Traditional approaches require that the number of “not-target” classes be known in advance and a training set (or class distributions) be available for all classes in the image to be classified to ensure optimal classifier performance. A modification to the standard Bayes decision rule is presented below to address this problem.
The decision making process for Bayes pattern recognition can be summarized as follows: Given a set of K measurement vectors, X(i), i=1, 2, . . . , K, it is desired to associate the measurements with either the “target” or the “not-target” class with minimum probability of error where X is a d-dimensional vector in the measurement space or X=(x1,x2, . . . , xd)T.
For the moment, let us assume that complete information, in the form of labeled training samples, is available for the “target” and the “not-target” classes. Using training samples from these two classes, we can estimate conditional probability density functions for the two classes where P(X/Ct) is the conditional probability density function (pdf) for the “target”, and P(X/Cnt) is the pdf for the “not-target” class. We will assume that the associated prior probabilities for the two classes, PC t and PCnt, are known. Using these probability estimates, the standard maximum likelihood decision rule for two class pattern recognition is:If: PCtP(X/Ct)≧PCntP(X/Cnt),  (1)                Classify X as target        Otherwise, Classify X as not-targetwhere        P(X/Ct)=Conditional probability density function of the “target” class        P(X/Cnt)=Conditional probability density function of the “not-target” class        PCt=prior probability of the “target”        PCnt=prior probability of the “not-target” class        
The maximum likelihood classifier is illustrated in FIG. 3 where normality is assumed for the target and not-target class-conditional probability density functions P(X/Ct) 20 and P(X/Cnt) 24. The decision boundary 22, where PCtP(X/Ct)=PCntP(X/Cnt), is also shown.
The univariate Gaussian density functions, 20 and 24, in FIG. 3 are defined as
                              P          ⁡                      (                          X              /                              C                i                                      )                          =                              1                          2              ⁢                              π                                  1                  /                  2                                            ⁢                              σ                i                                              ⁢                      e                                          1                /                2                            ⁢                                                (                                                            x                      -                                              μ                        i                                                                                    σ                      i                                                        )                                2                                                                        (        2        )            
The parameters of the density functions in FIG. 3 are μCt=7, μCnt=13, σCt=3, and σCnt=3. The prior probabilities are PCt=0.5 and PCnt=0.5.
Again referencing FIG. 3 it can be seen that                RCt=region where samples are classified as “target” 26: i.e., where P(X/Ct)≧P(X/Cnt)        RCnt=region where samples are classified as “not-target” 28: i.e., where P(X/Ct)<P(X/Cnt)        
An equivalent decision rule, to that in eq. (1), can be obtained by dividing both sides of eq. (1) by the unconditional probability of X, which is P(X), or
                                          If            ⁢                          :                        ⁢                                                            P                                      C                    i                                                  ⁡                                  (                                      X                    /                                          C                      t                                                        )                                                            P                ⁡                                  (                  X                  )                                                              ≥                                                    P                                  C                                      n                    ⁢                                                                                  ⁢                    t                                                              ⁢                              P                ⁡                                  (                                      X                    /                                          C                                              n                        ⁢                                                                                                  ⁢                        t                                                                              )                                                                    P              ⁡                              (                X                )                                                    ;                            (        3        )                            Classify X as target        Otherwise, Classify X as not-targetwhereP(X)=PCtP(X/Ct)+PCntP(X/Cnt)  (4)A graph of P(X) 30, eq. (4) is shown in FIG. 4.The Bayes decision rule can be defined in terms of posterior probabilities as:If: P(Ct/X)≧P(Cnt/X);  (5)        Classify X as target,        Otherwise, classify X as not-targetwhere P(Ct/X) and P(Cnt/X) are the conditional posterior probability functions for the “target” and the “not-target” classes respectively. These posterior probability functions are defined as:        
                              P          ⁡                      (                                          C                t                            /              X                        )                          =                                            P                              C                t                                      ⁢                          P              ⁡                              (                                  X                  /                                      C                    t                                                  )                                                          P            ⁡                          (              X              )                                                          (        6        )            
                              P          ⁡                      (                                          C                                  n                  ⁢                                                                          ⁢                  t                                            /              X                        )                          =                                            P                              C                                  n                  ⁢                                                                          ⁢                  t                                                      ⁢                          P              ⁡                              (                                  X                  /                                      C                                          n                      ⁢                                                                                          ⁢                      t                                                                      )                                                          P            ⁡                          (              X              )                                                          (        7        )            
Minter [T. C. Minter, “A Discriminant Procedure for Target Recognition in Imagery Data”, Proceedings of the IEEE 1980 National Aerospace and Electronic Conference—NAECON 1980, May 20-22, 1980] proposed an alternative Bayes decision rule that can be derived by noting that the two posterior probability functions sum to 1, namelyP(Ct/X)+P(Cnt/X)=1  (8)Rearranging eq. (8) we getP(Cnt/X)=1−P(Ct/X)(9)
Substituting eq. (9) into (5) and simplifying, we obtain an alternative Bayes decision rule which only involves the target posterior distribution function, namelyIf: P(Ct/X)≧½,  (10)                Classify X as the target        Otherwise, Classify X as not-targetwhere        
                              P          ⁡                      (                                          C                t                            /              X                        )                          =                                            P                              C                t                                      ⁢                          P              ⁡                              (                                  X                  /                                      C                    t                                                  )                                                          P            ⁡                          (              X              )                                                          (        11        )            
Equation (10) is referred to as the adaptive Bayes decision rule. The prior probability, PCt, of the target, in eq. (10), is assumed to be known.
FIG. 5, shows a graph of the target posterior distribution function, P(X/Ct) 32. Also shown in FIG. 5 are the decision boundary 34 (where P(Ct/X)=½) and the decision threshold 36, located at the point where P(Ct/X)=0.5. Again referencing FIG. 5 it can be seen that                RCt=region where samples are classified as “target” 38: i.e., where P(Ct/X)≧½        RCnt=region where samples are classified as “not-target” 40: i.e., P(Ct/X)<½        
Again referencing FIG. 5, a useful observation is that the function P(Ct/X) 32 is a transformation from the d-dimensional measurement space to the one-dimensional line. The function P(X/Ct) 32 maps the target samples as close as possible to “1” in region RCt 38 and it maps the not-target samples as close as possible to “0” in region RCnt 40.
The Adaptive Bayes decision rule, eq. (10), is adaptive for the following reason. It is capable of adapting the decision boundary to provide optimal discrimination between the “target” class and any unknown “not-target” class that may exist in the data set to be classified. In particular, the target class-conditional probability density function, P(X/Ct), in the numerator of eq. (11), can be estimated using labeled sample from the “target” class. For example, if P(X/Ct) is normally distributed, its mean and variance can be estimated from training samples. The unconditional probability density function, P(X), in the denominator of eq. (11), is not conditioned of a class and can be estimated using unlabeled samples from the data set to be classified. A number of nonparametric density function estimation techniques are available for estimating P(X). For example P(X) can be estimated using Kth Nearest Neighbor [R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973, pp. 87]. Using estimates for P(X/Ct) and P(X), the target posterior probability function, P(Ct/X), eq. (11), can estimated and used to classifying data using the adaptive Bayes decision rule, eq. (10). If a new data set is to be classified, P(X), can be re-estimated using unlabeled data from this new data set and a new target posterior probability function, P(Ct/X), derived. This new estimator for the posterior probability function P(Ct/X), re-optimizes (adapts) the decision boundary to accommodate any changes in the distribution of the “not-target” class in the new data set.
Gorte [B. Gorte and N. Gorte-Kroupnova, “Non-parametric classification algorithm with an unknown class”, Proceedings of the International Symposium on Computer Vision, 1995, pp. 443-448], Mantero [P. Mantero, “Partially supervised classification of remote sensing images using SVM-based probability density estimation”, IEEE Transactions on Geoscience and Remote Sensing, vol. 43, no. 3, March 2005, pp. 559-570], and Guerrero-Curieses [A. Guerrero-Curieses, A Biasiotto, S. B. Serpico, and. G. Moser, “Supervised Classification of Remote Sensing Images with Unknown Classes,” Proceedings of IGARSS-2002 Conference, Toronto, Canada, June 2002] investigated using Kth Nearest Neighbor probability estimation techniques to estimate P(Ct/X), eq. (11), and classify data using the adaptive Bayes decision rule, eq. (10). They demonstrated that it can be used successfully in crop identification using remotely sensed multi-spectral satellite imagery.
Minter [T. C. Minter, “A Discriminant Procedure for Target Recognition in Imagery Data”, Proceedings of the IEEE 1980 National Aerospace and Electronic Conference—NAECON 1980, May 20-22, 1980] proposed an alternative least squares criterion for approximating the target posterior probability, P(Ct/X). This least squares algorithm uses labeled samples from the target-class and unlabeled samples from the image to be classified, to approximate the target posterior probability function, P(Ct/X), with a function. This least squares criterion is described below.
Least Squares Estimation of the Adaptive Bayes Decision Function
The target posterior probability function, P(Ct/X), eq.(11), can be approximated by minimizing the mean square difference between the estimated target posterior probability function, {circumflex over (P)}(Ct/X), and the true target posterior probability function, P(Ct/X). The least squares criterion is:J=∫{circumflex over (P)}(Ct/X)−P(Ct/X)2P(X)dX+C  (12)where
                              P          ⁡                      (                                          C                t                            /              X                        )                          =                                            P                              C                t                                      ⁢                          P              ⁡                              (                                  X                  /                                      C                    t                                                  )                                                          P            ⁡                          (              X              )                                                          (        13        )            and, C, in eq. (12), is an arbitrary constant.
The least squares criteria, eq. (12), cannot be minimized directly since the true target posterior probability function, P(Ct/X), is unknown.
The least square criterion, eq. (12), is reformulated below to provide an equivalent criterion, with no unknowns, which can be minimized to estimate the parameters of a function which approximates the true target posterior probability function, P(Ct/X), in a least-squares sense.
First, expanding the least squares criteria, eq. (12), we getJ=∫({circumflex over (P)}(Ct/X)2−2{circumflex over (P)}(Ct/X)P(Ct/X)+P(Ct/X)2)P(X)dX+C  (15)Rearranging we get:J=∫({circumflex over (P)}(Ct/X)2P(X)dX−∫2{circumflex over (P)}(Ct/X)P(Ct/X)P(X)dX+∫P(Ct/X)2P(X)dX+C  (16)Substituting in the definition of P(Ct/X), from eq. (11), into the second term of eq. (16), we get:
                    J        =                  ∫                      (                                                                                                      P                      ^                                        ⁡                                          (                                                                        C                          t                                                /                        X                                            )                                                        2                                ⁢                                  P                  ⁡                                      (                    X                    )                                                  ⁢                                  ⅆ                  X                                            -                              ∫                                  2                  ⁢                                                            P                      ^                                        ⁡                                          (                                                                        C                          t                                                /                        X                                            )                                                        ⁢                                                                                    P                                                  C                          t                                                                    ⁢                                              P                        ⁡                                                  (                                                      X                            /                                                          C                              t                                                                                )                                                                                                            P                      ⁡                                              (                        X                        )                                                                              ⁢                                      P                    ⁡                                          (                      X                      )                                                        ⁢                                      ⅆ                    X                                                              +                              ∫                                                                            P                      ⁡                                              (                                                                              C                            t                                                    /                          X                                                )                                                              2                                    ⁢                                      P                    ⁡                                          (                      X                      )                                                        ⁢                                      ⅆ                    X                                                              +              C                                                          (        17        )            Noting that P(X) can be canceled out in the second term of eq. (17), we getJ=∫({circumflex over (P)}(Ct/X)2P(X)dX−∫2{circumflex over (P)}(Ct/X)PCtP(X/Ct)P(X)dX+∫P(Ct/X)2P(X)dX+C  (18)Let us choose C in eq. (18) as:C=2PCt  (19)or, multiplying eq. (19) by 1, we getC=2PCt·1  (19)and noting that∫P(X/Ct)dX=1  (20)We can substitute eq. (20) into eq. (19) and rewrite eq. (19) asC=2PCt∫P(X/Ct)dX  (21)Substituting this definition of C into eq. (18) we get:J=∫({circumflex over (P)}(Ct/X)2P(X)dX−2PCt∫{circumflex over (P)}(Ct/X)P(X/Ct)dX+2PCt∫P(X/Ct)dX+∫P(Ct/X)2P(X)dX  (22)Combining terms and rearranging we get:J=∫({circumflex over (P)}(Ct/X)2P(X)dX−2PCt∫({circumflex over (P)}(Ct/X)−1)P(X/Ct)dX+∫P(Ct/X)2P(X)dX  (23)Since the third term in eq. (23) is not a function of the estimated target posterior distribution, {circumflex over (P)}(Ct/X), it can be considered a constant. Let us define a new constant C′C′=∫P(Ct/X)2P(X)dX  (24)Substituting C′ into eq. (23) we getJ=∫({circumflex over (P)}(Ct/X)2P(X)dX−2PCt∫[{circumflex over (P)}(Ct/X)−1]P(X/Ct)dX+C′  (25)
The expected value of a function (∘) with respect to the labeled samples from the target class is defined as:ECt(∘)=∫(∘)P(X/Ct)dX  (26)
The expected value with respect to the unlabeled samples from P(X) (the data to be classified) is defined as:E(∘)=∫(∘)P(X)dX  (27)
Using these definitions, the least square criteria, eq. (25), can be rewritten as:J=E[{circumflex over (P)}(Ct/X)2]+2PCtECt[{circumflex over (P)}(Ct/X)−1]+C′  (28)We will approximate the posterior probability of the class-of-interest {circumflex over (P)}(Ct/X) using a linear combination of “scalar functions of the measurements”, or{circumflex over (P)}(Ct/X)≅ATF(X)  (29)where F(X) is a vector containing “scalar functions of the measurements”, orF(X)=(f(X)1, f(X)2, . . . f(X)d)T  (30)and the vector A is a vector of weights for the scalar functions, f(X), orA=(a1,a2, . . . ad)T  (31)
Since the parameter weighting vector, A=(a1,a2, . . . ad)T, is used to approximate the target posterior distribution function {circumflex over (P)}(Ct/X), we will often refer to A=(a1,a2, . . . ad)T as the Bayes parameter weighting vector.
Now, substituting ATF(X), from eq. (28), for {circumflex over (P)}(Ct/X) in eq. (28) we get:J=E[(ATF(X))2]+2PCtECt[ATF(X)−1]+C  (32)
For a finite set of K unlabeled training samples, X(i), i=1, 2, . . . , K, from the search image and Kt labeled samples, Xt(j), j=1, 2, . . . , Kt, from the target, we can rewrite eq. (32), as
                    J        =                                            1              K                        ⁢                                          ∑                                  i                  =                  1                                K                            ⁢                                                          ⁢                              [                                                      (                                                                  A                        T                                            ⁢                                              F                        ⁡                                                  (                                                      X                            ⁡                                                          (                              i                              )                                                                                )                                                                                      )                                    2                                ]                                              +                      2            ⁢                          P                              C                t                                      ⁢                          1              K                        ⁢                                          ∑                                  j                  =                  1                                                  K                  t                                            ⁢                                                          ⁢                              [                                                                            A                      T                                        ⁢                                          F                      ⁡                                              (                                                  X                          ⁡                                                      (                            j                            )                                                                          )                                                                              -                  1                                ]                                              +                      C            ′                                              (        33        )            
This formulation of the least square error criteria, eq. (33), is equivalent to the original least squares criterion, eq. (12), however eq. (33) is a preferable form since it contains no unknowns and differs only by a constant from eq. (12). In addition, it is shown below that eq. (33) can be minimized to obtain an estimator of the Bayes parameter weighting vector, A=(a1,a2, . . . ad)T.
However, the most important attribute of the least-squares criterion, eq. (33), is that it can be evaluated using only unlabeled samples from the search image and labeled samples from the target.
Another useful observation is that the function ATF(X) is a transformation from the d-dimensional measurement space to the one-dimensional real line. The least-square criterion in eq. (33) is minimized if ATF(X) maps the target samples as close to one as possible and the unlabeled samples, as close to zero as possible.
Estimating the Bayes Parameter Weighting Vector
In this section, an estimator for the Bayes parameter weighting vector A=(a1,a2, . . . ad)T, is obtained by minimization of the least-square criterion, eq. (32).
Differentiating J, in eq. (32), with-respect-to the Bayes parameter weighting vector, A, and setting to zero we get:
                                          δ            ⁢                                                  ⁢            J                                δ            ⁢                                                  ⁢            A                          =                                            2              ⁢                              E                ⁡                                  [                                      (                                                                  F                        ⁡                                                  (                          X                          )                                                                    ⁢                                                                        F                          ⁡                                                      (                            X                            )                                                                          T                                            ⁢                      A                                        )                                    ]                                                      +                          2              ⁢                              P                                  C                  t                                            ⁢                                                E                                      C                    t                                                  ⁡                                  [                                      F                    ⁡                                          (                      X                      )                                                        ]                                                              =          0                                    (        34        )            Rearranging eq. (34) yieldsE[(F(X)F(X)T)]A=PCtECt[F(X)]  (35)and solving for A we getA=PCtE[(F(X)F(X)T)]−1·ECt[F(X)]  (36)
Given a set of K unlabeled samples X(i), i=1, 2, . . . , K from the search image and Kt labeled samples Xt(i), i=1, 2, . . . , Kt from the target, the Bayes parameter weighting vector A=(a1,a2, . . . ad)T may be estimated as follows:
                    A        =                              P                          C              t                                ⁢                                                    {                                                      1                    K                                    ⁢                                                            ∑                                              i                        =                        1                                            K                                        ⁢                                                                                  ⁢                                          [                                              (                                                                              F                            ⁡                                                          (                                                              X                                ⁡                                                                  (                                  i                                  )                                                                                            )                                                                                ⁢                                                                                    F                              ⁡                                                              (                                                                  X                                  ⁡                                                                      (                                    i                                    )                                                                                                  )                                                                                      T                                                                          )                                            ]                                                                      }                                            -                1                                      ·                          1                              K                t                                              ⁢                                    ∑                              j                =                1                                            K                t                                      ⁢                                                  ⁢                          [                              F                ⁡                                  (                                                            X                      t                                        ⁡                                          (                      j                      )                                                        )                                            ]                                                          (        37        )            
Approximating the Target Posterior Probability Function
Below, a method is presented for approximating the target posterior probability function, {circumflex over (P)}(Ct/X) using a linear combination of scalar functions of the measurements.
For template-matching and matched-filtering, a particularly useful form for the “scalar functions of the measurements” F(X) in eq. (30), is simply the measurement vector X=(x1,x2, . . . , xd)T orF(X)=(x1,x2, . . . , xd)T  (38)Since{circumflex over (P)}(Ct/X)=ATF(X)  (39)the target posterior probability, {circumflex over (P)}(Ct/X), can be approximated using a linear combination of weighted scalar functions of the measurements of the form{circumflex over (P)}(Ct/X)≅a1x1+a2x2+ . . . +adxd  (40)whereA=(a1,a2, . . . ad)T  (41)andF(X)=(x1,x2, . . . , xd)T  (42)and since X=(x1,x2, . . . , xd)T, thenF(X)=X  (43)Substituting this definition of F(X) into eq. (37) we obtain the following estimator for the Bayes parameter weighting vector A=(a1,a2, . . . ad)T, or
                                                                        A                =                                                      P                                          C                      t                                                        ⁢                                      {                                                                  1                        K                                            ⁢                                                                        ∑                                                      i                            =                            1                                                    K                                                ⁢                                                                                                  ⁢                                                  [                                                                                    X                              ⁡                                                              (                                i                                )                                                                                      ⁢                                                                                          X                                ⁡                                                                  (                                  i                                  )                                                                                            T                                                                                )                                                                                      ]                                                              }                                      -              1                                ·                      1                          K              t                                      ⁢                              ∑                          j              =              1                                      K              t                                ⁢                                          ⁢                      [                                          X                t                            ⁡                              (                j                )                                      ]                                              (        44        )            Substituting eq. (43) into eq. (39) we get{circumflex over (P)}(Ct/X)≅ATX  (45)The Adaptive Bayes decision rule for classifying the measurement vector X becomesIf: ATX≧½;  (46)                Classify X as target,        Otherwise, classify X as not-target        
Interpreting the Adaptive Bayes Parameter Weighting Vector Estimator
It is important to note that the second term of the estimator for the Bayes parameter weighting vector, A=(a1,a2, . . . ad)T, eq. (37), is simply the mean, μt, of the target measurement vectors, i.e.
                              μ          t                =                              1                          K              t                                ⁢                                    ∑                              j                =                1                                            K                t                                      ⁢                                                  ⁢                                          X                t                            ⁡                              (                j                )                                                                        (        47        )            Therefore we can re-write eq. (37) as
                                                        A              =                                                P                                      C                    t                                                  ⁢                                  {                                                            1                      K                                        ⁢                                                                  ∑                                                  i                          =                          1                                                K                                            ⁢                                                                                          ⁢                                              [                                                                              X                            ⁡                                                          (                              i                              )                                                                                ⁢                                                                                    X                              ⁡                                                              (                                i                                )                                                                                      T                                                                          )                                                                              ]                                                      }                                -            1                          ·                  μ          t                                    (        48        )            
Equation (47) implies that the target mean, μt, can be estimated from a single target template. In many target recognition application, often only a single target template is available for using in training the classifier. If that single target template is truly representative of the mean of the target, a valid approximation of the target posterior probability function {circumflex over (P)}(Ct/X) can be obtained using the Bayes parameter weighting vector, A=(a1,a2, . . . ad)T, and target posterior probability function {circumflex over (P)}(Ct/X), estimated using eq. (45). This, in turn, implies we can obtain optimal discrimination between the target and not-target objects in the search image using a single target template. Other statistical decision rules, such as the Gaussian maximum likelihood decision rule and the Fisher's linear discriminant, require sufficient numbers of target templates to estimate both the mean and covariance matrix of the target—a significant limitation in target recognition applications.
Again referring to eq. (48), it is also important to note that the M×M matrix,
                    {                              1            K                    ⁢                                    ∑                              i                =                1                            k                        ⁢                                                  ⁢                          [                                                X                  ⁡                                      (                    i                    )                                                  ⁢                                                      X                    ⁡                                          (                      i                      )                                                        T                                            )                                      ]            }              -      1        ,is formed using unlabeled samples from the image being searched. If a new image is to be searched, a new set of set of unlabeled samples can be obtained from the new search image and used to obtain a new estimate of the M×M matrix,
                    {                              1            K                    ⁢                                    ∑                              i                =                1                            K                        ⁢                                                  ⁢                          [                                                X                  ⁡                                      (                    i                    )                                                  ⁢                                                      X                    ⁡                                          (                      i                      )                                                        T                                            )                                      ]            }              -      1        .An updated estimate of the Bayes parameter weighting vector A=(a1,a2, . . . ad)T can then be obtained using eq. (48) and a new approximation of the target posterior probability function {circumflex over (P)}(Ct/X) obtained using eq. (45). Using this updated estimate of {circumflex over (P)}(Ct/X), we can obtain optimal discrimination between the target and not-target objects in the new search image. Thus the decision boundary is re-optimized (adapted) for discriminating between the target and unknown objects in the new search image using only unlabeled samples from the new search image.