Text classification is an important problem for many tasks in natural language processing, such as user-interfaces for command and control. In such methods, training data derived from a number of classes of text are used to optimize parameters used by a method for estimating a most likely class for the text.
Multinomial Logistic Regression (MLR) Classifiers for Text Classification.
Text classification estimates a classy from an input text x, where y is a label of the class. The text can be derived from a speech signal.
In prior art multinomial logistic regression, information about the input text is encoded using a feature functionƒj,k:(x,y){0,1},typically defined such that
            f              j        ,        k              ⁡          (              x        ,        y            )        =      {                            1                                                                    t                j                            ∈                              x                ⁢                                                                  ⁢                and                ⁢                                                                  ⁢                y                                      =                          I              k                                                            0                                      otherwise            ,                              
In other words, the feature is 1 if a term tj is contained in the text x, the class label y is equal to category Ik.
A model used for the classification is a conditional exponential model of the form
                              p          Λ                ⁡                  (                      y            |            x                    )                    =                        1                                    Z              Λ                        ⁡                          (              x              )                                      ⁢                  ⅇ                                    ∑                              j                ,                k                                      ⁢                                          λ                                  j                  ,                  k                                            ⁢                                                f                                      j                    ,                    k                                                  ⁡                                  (                                      x                    ,                    y                                    )                                                                          ,                  ⁢    where                      Z        Λ            ⁡              (        x        )              =                  ∑        y                                      ⁢                        ⅇ                                    ∑                              j                ,                k                                      ⁢                                          λ                                  j                  ,                  k                                            ⁢                                                f                                      j                    ,                    k                                                  ⁡                                  (                                      x                    ,                    y                                    )                                                                    .            and λj,k and Λ are the classification parameters.
The parameters are optimized on training pairs of texts xi and labels yi, using an objective function
            L      Λ        =                            ∑                      i            ,            j            ,            k                          ⁢                              λ                          j              ,              k                                ⁢                                    f                              j                ,                k                                      ⁡                          (                                                x                  i                                ,                                  y                  i                                            )                                          -              log        ⁢                              ∑                          y              ′                                ⁢                      ⅇ                                          ∑                                  j                  ,                  k                                            ⁢                                                λ                                      j                    ,                    k                                                  ⁢                                                      f                                          j                      ,                      k                                                        ⁡                                      (                                                                  x                        i                                            ,                                              y                        ′                                                              )                                                                                            ,which is to be maximized with respect to Λ.
Regularized Multinomial Logistic Regression Classifiers
Regularization terms can be added to classification parameters in logistic regression to improve a generalization capability.
In regularized multinomial logistic regression classifiers, a general formulation using both the L1-norm and the L2-norm regularizers is
            L      Λ        =                            ∑                      i            ,            j            ,            k                          ⁢                                  ⁢                              λ                          j              ,              k                                ⁢                                    f                              j                ,                k                                      ⁡                          (                                                x                  i                                ,                                  y                  i                                            )                                          -              log        ⁢                              ∑                          y              ′                                ⁢                                          ⁢                      ⅇ                                          ∑                                  j                  ,                  k                                            ⁢                                                          ⁢                                                λ                                      j                    ,                    k                                                  ⁢                                                      f                                          j                      ,                      k                                                        ⁡                                      (                                                                  x                        i                                            ,                                              y                        ′                                                              )                                                                                          -              α        ⁢                              ∑                          j              ,              k                                ⁢                                          ⁢                                                                  λ                                  j                  ,                  k                                                                    2                              -              β        ⁢                              ∑                          j              ,              k                                ⁢                                          ⁢                                                λ                              j                ,                k                                                                      ,where
  α  ⁢            ∑              j        ,        k              ⁢                  ⁢                                    λ                      j            ,            k                                      2      is the L2-norm regularizer, and
  β  ⁢            ∑              j        ,        k              ⁢                  ⁢                        λ                  j          ,          k                          is an L1-norm regularizer, and α and β are weighting factors. This objective function is again to be maximized with respect to Λ.
Various methods can optimize the parameters under these regularizations.
Topic Modeling
In prior art, probabilistic latent semantic analysis (PLSA) and latent Dirichlet analysis (LDA) are generative topic models in which topics are multinomial latent variables, and the distribution of topics depends on particular document including the text where the words are distributed multinomially given the topics. If the documents are associated with classes, then such models can be used for text classification.
However with generative topic models, the class-specific parameters and the topic-specific parameters are additive according to a logarithmic probability.