(1) Field of the Invention
This invention relates to a method for classifying data and more particularly to a training method for a Bayesian Data Reduction Algorithm classifier that enables the identification of data clusters.
Classification systems are a type of artificial intelligence systems that are implemented on digital computers. These systems are implemented using neural networks or statistical measures. Implementation on a neural network involves training the neural network to recognize the given classes. As an example, when given an input XI the classification system decides to which class the input X belongs. If known, measurable characteristics separate classes, the classification decision is straightforward. However, for most applications, such characteristics are unknown, and the classification system must decide which output class does the input X most closely resemble. In such applications, the output classes and their characteristics are modeled (estimated) using statistics for the classes derived from training data belonging to known classes. Thus, the standard classification approach is to first estimate the statistics from the given training data belonging to known classes and then to apply a decision rule using these estimated or modeled statistics.
(2) Description of the Prior Art
In many real world classification problems the domain of the observed data, or features, describing each class can be complicated, obscure, and highly overlapped. The result is that the task of discriminating amongst the classes with standard supervised training techniques can be nearly impossible. However, within these difficult domains, it can often be the case that the target class of interest (e.g., data that produce a desired yield and are thus categorized as the target class) contains isolated unknown clusters (subgroups of data points), where the observations within each cluster have similar statistical properties. In these situations classification performance (or, the average yield) can be significantly improved if one develops a classifier to recognize, or mine, observations within the clusters as the target class, and where all other nonclustered observations (i.e., both with and without a desired yield) are considered the alternative class (the non-target class). A benefit of such a classifier is that subsets of target data points, producing a consistent desired average yield, can be recognized with a minimum probability of error. This is in contrast to a traditional classification approach to this problem (i.e., trained in a completely supervised manner) that has the potential to produce a much higher probability of error and a lower average yield.
Bayesian networks, also known as belief networks are known in the art for use as filtering systems. The belief network is initially learned by the system from data provided by an expert, user data and user preference data. The belief network is relearned when additional attributes are identified having an effect. The belief network can then be accessed to predict the effect.
These benefits can be achieved in diverse fields having multi-dimensional data. Large quantities of data are available in the securities market, and it would be valuable to find groups of securities having predefined characteristics such as a certain yield from the available data. Other fields for using such a classification system are target identification, medical diagnosis, speech recognition, digital communications and quality control systems.
Classification systems are a type of artificial intelligence systems that are implemented on digital computers. These systems are implemented using neural networks or statistical measures. Implementation on a neural network involves training the neural network to recognize the given classes. As an example, when given an input X, the classification system decides to which class the input X belongs. If known, measurable characteristics separate classes, the classification decision is straightforward. However, for most applications, such characteristics are unknown, and the classification system must decide which output class does the input X most closely resemble. In such applications, the output classes and their characteristics are modeled (estimated) using statistics for the classes derived from training data belonging to known classes. Thus, the standard classification approach is to first estimate the statistics from the given training data belonging to known classes and then to apply a decision rule using these estimated or modeled statistics.
FIG. 1A illustrates the problem of interest with a straightforward example containing one thousand samples of one dimensional domain data (a single feature). Each data point for the target class, 10, is shown with a “O”, and each data point for the non-target class, 12, is shown with a “+”. A data cluster 14 is apparent from the FIG. (The data for this figure was generated, for each dimension of each class (i.e., except those within the cluster), to be uniform, independent, and identically distributed. However, with respect to the features each data cluster was generated as Gaussian distributed, with a randomly generated mean, and constrained to be located around the specified “center” yield value.)
Bayesian networks, also known as belief networks are known in the art for use as filtering systems. The belief network is initially learned by the system from data provided by an expert, user data and user preference data. The belief network is relearned when additional attributes are identified having an effect. The belief network can then be accessed to predict the effect.
In this case, the ordinate that defines the yield of each data point is plotted versus the domain, where a yield value of 0.5 is used to separate and define the five hundred samples of the target class (i.e., yield>0.5), and the five hundred samples of the non-target class (yield<0.5). It can clearly be seen in this figure that the two classes are highly overlapped with respect to the range of the single feature. In fact, later it will be shown that traditional supervised classification approaches with this data produce nearly a 0.5 probability of error, and an overall average yield of just slightly more than 0.5. However, notice in FIG. 1A that a cluster 14 of data points also exists in the target class 10 with an average yield of approximately 0.6. Thus, it would be advantageous to develop a classifier for this data that can essentially mine and recognize the positive yielding cluster 14 from all other data points contained in FIG. 1A. One obvious technique to classify the cluster point in this data would be to visually determine threshold points from FIG. 1A; however, typical problems involve multi-dimensional feature spaces that prevent visual determination of thresholds. Any developed technique should be applicable to multi-dimensional feature spaces.
FIG. 1B shows a more generalized illustration of the problem. In FIG. 1B there is a plot containing one thousand samples of one dimensional domain data (a single feature). The data for this figure was generated, for each dimension of each class (i.e., except those within the cluster), to be uniform, independent, and identically distributed. However, with respect to the features each data cluster was generated as Gaussian distributed, with a randomly generated mean, and constrained to be located around the specified center yield value. In FIG. 1B, the ordinate that defines the yield of each data point is plotted versus the domain, where a yield value of 0.5 is used to separate and define the five hundred samples of the target class (i.e., yield>0.5) identified as 10, and the five hundred samples of the nontarget class (yield<0.5) identified as 12.
It can clearly be seen that the two classes contain many commonly distributed points with respect to the range of the single feature. This case differs from the case shown in FIG. 1A in that three clusters of data points, 18A, 18B and 18C, exist within the target class containing actual respective yields of 0.6, 0.75, and 0.9. In this example, each data cluster was randomly placed to be centered somewhere between the yield values of 0.5 and 1, where, as stated previously, the focus of the general embodiment of the method is on mining each of these clusters.
Prior art methods for classifying data are provided in U.S. Pat. Nos. 6,397,200 and 6,789,070. These are incorporated by reference herein. U.S. Pat. No. 6,397,200 provides a data reduction method for a classification system using quantized feature vectors for each class with a plurality of features and levels. The method utilizes application of a Bayesian data reduction algorithm to the classification system for developing reduced feature vectors. Test data is then quantified into the reduced feature vectors. The reduced classification system is then tested using the quantized test data. A Bayesian data reduction algorithm is further provided by computing an initial probability of error for the classification system. Adjacent levels are merged for each feature in the quantized feature vectors. Level-based probabilities of error are then calculated for these merged levels among the plurality of features. The system then selects and applies the merged adjacent levels having the minimum level based probability of error to create an intermediate classification system. Steps of merging, selecting and applying are performed until either the probability of error stops improving or the features and levels are incapable of further reduction.
U.S. Pat. No. 6,789,070 provides an automatic feature selection system for test data with data (including the test data and/or the training data) containing missing values in order to improve classifier performance. The missing features for such data are selected in one of two ways: the first approach assumes each missing feature is uniformly distributed over its range of values, and the second approach increases the number of discrete levels for each feature by one for the missing features. These two choices modify the Bayesian Data Reduction Algorithm for automatic feature selection.
This method for solving the problem in FIG. 1A builds upon and utilizes the previously introduced Mean-Field Bayesian Data Reduction Algorithm (Mean-Field BDRA) based classifier. The Mean-Field BDRA classifier was developed to mitigate the effects of the curse of dimensionality by eliminating irrelevant feature information in the training data (i.e., lowering M), while simultaneously dealing with the missing feature information problem. The mean-field BDRA was first introduced in R. S. Lynch, Jr. and P. K. Willett, “Adaptive Classification by Maximizing the Class Separability with Respect to the Unlabeled Data,” Proceedings of the 2003 SPIE Symposium on Security and Defense, Orlando, Fla., April 2003. This paper discloses a method of Bayesian Data Reduction which assigns an assumed uniform Dirichlet (completely non-informative) prior for the symbol probabilities of each class. In other words, the Dirichlet is used to model the situation in which the true probabilistic structure of each class is unknown and has to be inferred from the training data.
The Modified Mean-Field BDRA was developed to better deal with problems in which the class-labeling feature is the primary missing attribute in the training data. In general, this problem greatly complicates the modeling of each class, and to deal with it the mean-field BDRA was created that encourages dissimilar distributions with respect to all missing value data.
The primary aspect of the Mean-Field BDRA (that is, in addition to its data model that incorporates a class-labeling feature) that differentiates it from the original BDRA is its method of dealing with the missing features problem. In the Mean-Field BDRA the missing feature information is adapted by estimating the missing feature from the available training data. The following model provides further detail. Specifically, let z be an N-dimensional vector containing the entire collection of training data for all k classes, and using the Dirichlet distribution based model, this is written as
                              f          ⁢                                          ⁢                      (            z            )                          =                              ∫            p                                                          ⁢                                    ∏                              i                =                1                            N                        ⁢                                                  ⁢                                          [                                                      ∑                                          l                      ∈                                              w                        i                                                                                                                                            ⁢                                                                          ⁢                                      p                    l                                                  ]                            ⁢              f              ⁢                                                          ⁢                              (                p                )                            ⁢                                                          ⁢                              ⅆ                p                                                                        (        1        )            where p1 is the probability of the lth discrete symbol out of a total of M (with p representing all M symbols), f(p) is the Dirichlet distribution prior on the symbol probabilities given by:
                                          (                          M              -              1                        )                    !                ⁢                  I                                    {                                                                    ∑                                          l                      =                      1                                        M                                    ⁢                                      p                    l                                                  =                1                            }                        ,                                              (                  1          ⁢          A                )            and wi is the set of all discrete symbols that observation zi could take on if all possible outcomes of its missing features are substituted in. The notation I{x} is the indicator function that has a value of one when “x” is true, and a value of zero otherwise.
Equation (1) represents the optimal approach to solving this problem. However, when expanded, and after integration, Equation (1) results in a sum of products whose number of terms depends upon the number of missing features in the data. That is, there are
      ∏          i      =      1        N    ⁢          ⁢                w      i          terms in the sum, where |wi| is the cardinality of the ith feature vector. Thus, with no missing features in any of the data only one term is left over. On the other hand, if N=20 and each feature vector has one missing binary valued feature then Equation (1) would contain 220, or approximately one million terms. This of course makes any implementation of this equation impractical.
As an alternative to Equation (1), the distribution contained in it, f(z|p), and given by
      f    ⁢                  ⁢          (              z        |        p            )        =            ∏              i        =        1            N        ⁢                  ⁢          ⌊                        ∑                      l            ∈                          w              i                                                                      ⁢                                  ⁢                  p          l                    ⌋      is replaced with
      f    ⁢                  ⁢          (              z        |        p            )        =            ∏              j        =        1            M        ⁢                  ⁢          p      j              x        j            in which
            x      j        =                  ∑                  i          =          1                N            ⁢                          ⁢                        β                      i            ,            j                          ⁢                                  ⁢                  I                      {                          j              ∈                              w                i                                      }                                ,                    (        i        )            ⁢                          ⁢              β                  i          ,          j                    ⁢                          ⁢      iff      ⁢                          ⁢      j        ∉          w      i        ,            and      ⁢                          ⁢              (        ii        )            ⁢                          ⁢                        ∑                      j            =            1                    M                ⁢                                  ⁢                  β                      i            ,            j                                =    1.  It is appropriate to think of each symbol-uncertain datum (i.e., each feature vector missing features) in these equations as being separated into small quanta, with respect to the remaining training data, and apportioned amongst the possible symbols the datum can take on. However, it is preferred here to think of the above equations as a mean-field approximation of the unknowable probability sum.
In general, under mean-field theory the expectation E(f(x)) is replaced by f(E(x)). Thus, identifying “f(x)” as a particular term in the sum of products in Equation (1), meaning a particular configuration of the actual symbols of the symbol-uncertain data, the expected value of this data is added to the appropriate symbol's total number of observations. To accomplish this, the following iterative steps are used (these steps will be referred to as the mean-field recursion):
(i) Begin with
      n    =    1    ,            β              i        ,        j                    (        1        )              =          0      ⁢              ∀                  j          ∉                      w            i                                ,            and      ⁢                          ⁢              β                  i          ,          j                          (          1          )                      =                  π                  i          ,          j                    ⁢              ∀                  j          ∈                      w            i                              where for the ith datum, given an equal initial probability is assigned for all possible uncertain symbols,
      π          i      ,      j        =            1                                w          i                              .  (ii) Take the expectation value to update
      β          i      ,      j              (              n        +        1            )        =      0    ⁢          ∀              j        ∉                  w          i                    and
      β          i      ,      j              (              n        +        1            )        =                              (                      1            +                                          ∑                                                      l                    =                    1                                    ,                                      l                    ≠                    i                                                  N                            ⁢                                                          ⁢                              β                                  i                  ,                  j                                                  (                  n                  )                                                              )                ⁢                                  ⁢                  π                      i            ,            j                                                ∑                      j            ∈                          w              i                                                                      ⁢                                  ⁢                  (                                    (                              1                +                                                      ∑                                                                  l                        =                        1                                            ,                                              l                        ≠                        i                                                              N                                    ⁢                                                                          ⁢                                      β                                          i                      ,                      j                                                              (                      n                      )                                                                                  )                        ⁢                                                  ⁢                          π                              i                ,                j                                              )                      ⁢                  ⁢          ∀              j        ∈                  w          i                    (iii) If
            ∑              i        =        1            N        ⁢                  ∑                  j          =          1                M            ⁢                        (                                    β                              i                ,                j                                            (                                  n                  +                  1                                )                                      -                          β                              i                ,                j                                            (                n                )                                              )                2              >      (    Tolerance    )  then set n=n+1 and go to (ii).
At convergence,
      x    j    =            ∑              i        =        1            N        ⁢          β              i        ,        j                    (        n        )            is computed for the number of outcomes of the jth symbol. In general, if the iterative steps given are not utilized (i.e., only step one is used) then this amounts to assigning for the ith datum a hard outcome to all possible uncertain symbols it can be, the jth of which being assigned πi,j.
Notice that steps (i) through (iii) shown above are similar to the recursive steps utilized in the Expectation Maximization (EM) algorithm. A typical implementation of EM involves using the available data to estimate, or “plug-in,” the components of a Gaussian mixture density. However, the recursive steps, above, involve estimation of the βi,j's for an algorithm that is approximately Bayesian. In any case, as the EM algorithm has been shown to converge to a solution, it is expected that due to its similar form, the Mean-Field BDRA will also converge.
In seeking best performance for a given data set the dimensionality reduction steps of the BDRA are used after each application of the mean-field recursion described above. That is, the Mean-Field BDRA alternates between reducing irrelevant feature information and “filling-in” missing feature values. The steps of the basic BDRA have been modified to include a class-labeling feature in augmentation to each datum. Recall, the algorithm reduces the quantization complexity to the level that minimizes the average conditional probability of error, P(e|X), and in its modified form it appears as
                              P          ⁡                      (                          e              ❘              X                        )                          =                              ∑                          k              =              1                        C                    ⁢                                    ∑              y                        ⁢                                          P                ⁡                                  (                                      H                    k                                    )                                            ⁢                              I                                  {                                                                                    f                        k                                            ≤                                              f                        l                                                              ,                                                                  for                        ⁢                                                                                                  ⁢                        all                        ⁢                                                                                                  ⁢                        k                                            ≠                      l                                                        }                                            ⁢                              f                k                                                                        (        2        )            where
            f      k        =                  f        ⁡                  (                                    y              ❘                              x                k                                      ,                          H              k                                )                    =                                                                  N                y                            !                        ⁢                                          (                                                      N                    k                                    +                  M                  -                  1                                )                            !                                                          (                                                N                  k                                +                                  N                  y                                +                M                -                1                            )                        !                          ⁢                              ∏                          j              ∈                              H                k                                              ⁢                                                    (                                                      x                    j                                    +                                      y                    j                                                  )                            !                                                                        x                  j                                !                            ⁢                                                y                  j                                !                                                          ;C is the total number of classes with kε{1, . . . , C};M is the number of discrete symbols;jεHk is defined as all discrete symbols, j, associated with class k, and with the class-labeling feature is equal to k;Hk is the hypothesis defined as py=pall jεHk, and
      {                            ∑                      k            =            1                    C                ⁢                              ∑                          j              ∈                              H                k                                      M                    ⁢                      p            j                              =      1        }    ;X is the entire collection of training data from all C classes;xjεHk is the number of occurrences of the jth symbol in the training data defined for all jεHk;
  N  ⁢      {          N      =                        ∑                      j            =            1                    M                ⁢                  x          j                      }  is the total number of training data, where the fraction belonging to the kth class is given by
      {                  N        k            =                        ∑                      j            ∈                          H              k                                      ⁢                  x          j                      }    ;yj is the number of occurrences of the jth symbol in the test data;
      N    y    ⁢      {                  N        y            =                        ∑                      j            =            1                    M                ⁢                  y          j                      }  is the total number of the test data; andI{x} is the indicator function such that I{x}=1 when x is true and I{x}=0 when x is false.Note, the typical situation considered involves one observation of test data (i.e., Ny=1), thus, f(y|x,Hk) of Equation (2) becomes
                              f          ⁡                      (                                                            y                  i                                =                                  1                  |                  x                                            ,                              H                k                                      )                          =                                                            x                                  j                  ∈                                      H                    k                                                              +              1                                      N              +              M                                .                                    (        3        )            
Given the above equations, dimensionality reduction (i.e., feature selection) is implemented on the training data using the following iterative steps, which are analogous to backward sequential feature selection.
(i) Apply mean-field recursive steps to the data.
(ii) Using the initial training data with quantization complexity M (e.g., in the case of all binary valued features M=2Nf, where Nf is the number of features), Equation (2) is used to compute P(e|X;M).
(iii) Beginning with the first feature (selection is arbitrary), and excluding the class labeling feature, reduce this feature by summing or merging (i.e., marginalizing) the numbers of occurrences of those quantized symbols that correspond to joining adjacent discrete levels of that feature.(iv) Re-apply mean-field recursive steps to the data.(v) Use the newly merged training data (it is referred to as X′) and the new quantization complexity (e.g., M′=2Nf-1 in the binary feature case), and use Equation (2) to compute P(e|X′;M′).(vi) Repeat items (iii), (iv) and (v) for all Nf features.(vii) From item (vi) select the minimum of all computed P(e|X′;M′) (in the event of a tie use an arbitrary selection), and choose this as the new training data configuration. (This corresponds to permanently reducing, or removing, the associated feature.)(viii) Repeat items (iii) through (vii) until the probability of error does not decrease any further, or until M′=2, at which point the final quantization complexity has been found.
The Mean-Field BDRA is modified in this section to improve its performance. Its performance is particularly improved when the adapted training data is missing the class labeling feature. The idea behind the method of the current invention is based on developing a model that encourages dissimilar distributions amongst the classes with respect to all missing feature information. Therefore, given the missing feature values, the new method is designed to give more likelihood to those feature vectors that have dissimilar values.
The modified Mean-Field BDRA is based on the assumptions that the distribution of the true discrete symbol probabilities, (pk,i), for the ith discrete symbol of the kth class, are uniformly Dirichlet distributed, and that the form of the underlying new distributional model is given by,
                              f          ⁡                      (                                          p                                  1                  ,                  i                                            ,                              p                                  2                  ,                  i                                            ,              …              ⁢                                                          ,                                                p                                      c                    ,                    i                                                  |                                  p                  i                                                      )                          =                              K                          p              i                                ⁢                                    (                                                p                                      1                    ,                    i                                                                    p                  i                                            )                                      α              -              1                                ⁢                                    (                                                p                                      2                    ,                    i                                                                    p                  i                                            )                                      α              -              1                                ⁢          …          ⁢                                          ⁢                                    (                                                p                                      c                    ,                    i                                                                    p                  i                                            )                                      α              -              1                                                          (        4        )            where
                    ∑                  k          =          1                C            ⁢              p                  k          ,          i                      =          p      i        ,C is the total number of classes, K is a normalizing constant, and α is a constant that controls the shape of the distribution. Typically, a smaller value of α means more dissimilarity between the distributions of each class.
Given Equation (4), Equation (3) is now redeveloped by writing it as,
                              f          ⁡                      (                                                            y                  i                                =                                  1                  |                  x                                            ,                              H                k                                      )                          =                                            ∫              0              1                        ⁢                                          ∫                0                                  P                  i                                            ⁢                                                f                  ⁡                                      (                                                                                            y                          i                                                =                        1                                            ,                                                                                                    p                            i                                                    ⁢                                                      p                                                          k                              ,                              i                                                                                                      |                        x                                            ,                                              H                        k                                                              )                                                  ⁢                                                                  ⁢                                  ⅆ                                      p                    i                                                  ⁢                                                                  ⁢                                  ⅆ                                      p                                          k                      ,                      i                                                                                                    =                                    ∫              0              1                        ⁢                                          ∫                0                                  P                  i                                            ⁢                                                f                  ⁡                                      (                                                                                            y                          i                                                =                                                  1                          |                                                                                    p                              i                                                        ⁢                                                          p                                                              k                                ,                                i                                                                                                                                                        ,                      x                      ,                                              H                        k                                                              )                                                  ⁢                                  f                  ⁡                                      (                                                                  p                                                  k                          ,                          i                                                                    ,                                              |                                                  p                          i                                                                    ,                                              x                        ⁢                                                                                                  ⁢                                                  H                          k                                                                                      )                                                  ⁢                                  f                  ⁡                                      (                                                                                            p                          i                                                |                        x                                            ,                                              H                        k                                                              )                                                  ⁢                                                                  ⁢                                  ⅆ                                      p                    i                                                  ⁢                                                                  ⁢                                  ⅆ                                      p                                          k                      ,                      i                                                                                                                              (        5        )            Equation (5) can also be written as,f(yi=1|x,Hk)=∫01∫0pif(yi=1|pk,i,Hk)f(x|pk,i,Hk)f(pk,i|piHk)f(pi|x,Hk)dpidpk,i  (6)where,
            f      ⁡              (                                            y              i                        =                          1              ❘                              p                                  k                  ,                  i                                                              ,                      H            k                          )              =          p              k        ,        i              ;            f      ⁡              (                              x            |                          p                              k                ,                i                                              ,                      H            k                          )              =                  (                                                                              ∑                                      j                    =                    1                                    C                                ⁢                                  x                                                            j                      ∈                                              H                        j                                                              ,                    i                                                                                                                          x                                                      j                    ⁢                                                                                  ⁢                    ε                    ⁢                                                                                  ⁢                                          H                                              k                        ,                        i                                                                              ⁢                                                                                                                            )            ⁢                        (                                    p                              k                ,                i                                                    p              i                                )                          x                      j            ∈                          H                              k                ,                i                                                        ⁢                        (                      1            -                                          p                                  k                  ,                  i                                                            p                i                                              )                                                    ∑                              j                =                1                            C                        ⁢                          x                              j                ∈                                  H                                      k                    ,                    i                                                                                ⁢                                                      ;            f      ⁡              (                                            p                              k                ,                i                                      |                          p              i                                ,                      H            k                          )              =                  1                  p          i                    ⁢                        (                                    p                              k                ,                i                                                    p              i                                )                          α          -          1                    ⁢                        (                      1            -                                          p                                  k                  ,                  i                                                            p                i                                              )                          α          -          1                      ;            f      ⁡              (                                            p              i                        |            x                    ,                      H            k                          )              =                                        Γ            ⁡                          (                              N                +                M                            )                                ⁢                                                    p                i                                                      ∑                                          j                      =                      1                                        C                                    ⁢                                      x                                          j                      ∈                                              H                                                  j                          ,                          i                                                                                                                                ⁡                              (                                  1                  -                                      p                    i                                                  )                                                    N              -                                                ∑                                      j                    =                    1                                    C                                ⁢                                                                  ⁢                                  x                                      j                    ∈                                          H                                              j                        ,                        i                                                                                                        +              M              -              2                                                            Γ            ⁡                          (                                                                    ∑                                          j                      =                      1                                        c                                    ⁢                                      x                                          j                      ⁢                                                                                          ⁢                      ε                      ⁢                                                                                          ⁢                                              H                                                  j                          ,                          i                                                                                                                    +                1                            )                                ⁢                      Γ            ⁡                          (                              N                -                                                      ∑                                          j                      =                      1                                        C                                    ⁢                                      x                                          j                      ⁢                                                                                          ⁢                      ε                      ⁢                                                                                          ⁢                                              H                                                  j                          ,                          i                                                                                                                    +                M                -                1                            )                                          .      
Using these equations, Equation (6) can now be solved, which produces the result,
                              f          ⁡                      (                                                            y                  i                                =                                  1                  |                  x                                            ,                              H                k                                      )                          =                                                            Γ                ⁡                                  (                                                            x                                              j                        ⁢                                                                                                  ⁢                        ε                        ⁢                                                                                                  ⁢                                                  H                                                      k                            ,                            i                                                                                                                +                    α                    +                    1                                    )                                            ⁢                              Γ                ⁡                                  (                                                                                    ∑                                                  j                          =                          1                                                C                                            ⁢                                              x                                                  j                          ⁢                                                                                                          ∉                                                                                                          ⁢                                                      H                                                          k                              ,                              i                                                                                                                                            +                    α                                    )                                            ⁢                              Γ                ⁡                                  (                                                                                    ∑                                                  j                          =                          1                                                C                                            ⁢                                              x                                                  j                          ⁢                                                                                                          ⁢                          ε                          ⁢                                                                                                          ⁢                                                      H                                                          j                              ,                              i                                                                                                                                            +                    2                                    )                                                                                    Γ                ⁡                                  (                                      N                    +                    M                                    )                                            ⁢                              Γ                ⁡                                  (                                                            x                                              j                        ⁢                                                                                                  ⁢                        ε                        ⁢                                                                                                  ⁢                                                  H                                                      k                            ,                            i                                                                                                                +                    1                                    )                                            ⁢                              Γ                ⁡                                  (                                                                                    ∑                                                  j                          =                          1                                                C                                            ⁢                                              x                                                  j                          ∉                                                                                                          ⁢                                                      H                                                          k                              ,                              i                                                                                                                                            +                    1                                    )                                            ⁢                              Γ                ⁡                                  (                                                                                    ∑                                                  j                          =                          1                                                C                                            ⁢                                              x                                                  j                          ⁢                                                                                                          ⁢                          ε                          ⁢                                                                                                          ⁢                                                      H                                                          j                              ,                              i                                                                                                                                            +                                          2                      ⁢                      α                                        +                    1                                    )                                                              .                                    (        7        )            In the results that follow, values for α=1 are to be considered, which produces the following for Equation (7),
                              f          ⁡                      (                                                            y                  i                                =                                  1                  |                  x                                            ,                                                H                  k                                ;                                  α                  =                  1                                                      )                          =                                                            x                                  j                  ⁢                                                                          ⁢                  ε                  ⁢                                                                          ⁢                                      H                                          k                      ,                      i                                                                                  +              1                                                      (                                  N                  +                  M                                )                            ⁢                              (                                                                            ∑                                              j                        =                        1                                            C                                        ⁢                                          x                                              j                        ⁢                                                                                                  ⁢                        ε                        ⁢                                                                                                  ⁢                                                  H                                                      j                            ,                            i                                                                                                                                +                  2                                )                                              .                                    (        8        )            In comparing the previous model of Equation (2) to Equation (8), it is apparent that under the new model shown above more emphasis is now placed on dissimilar probabilities for the training data of each class.
The prior art does not disclose a method for training a Mean-Field Bayesian Reduction classifier for detecting clusters in unknown data.