Machine learning is a general term that describes automatically setting the parameters of a system so that the system operates better. One common use for machine learning is the training of parameters for a system that predicts the behavior of objects or the relationship between objects. An example of such a system is a language model used to predict the likelihood of a sequence of words in a language.
One problem with current machine learning is that it can require a great deal of time to train a single system. In particular, systems that utilize Maximum Entropy techniques to describe the probability of some event tend to have long training times, especially if the number of different features that the system uses is large.
Conditional maxent probability models are of the form
                              P          ⁡                      (                          y              ❘                              x                _                                      )                          =                                            ∑                              e                j                                      ⁢                                                  ⁢                                          λ                j                            ⁢                                                f                  j                                ⁡                                  (                                                            x                      _                                        ❘                    y                                    )                                                                                        ∑                              y                ′                                                                                  ⁢                          exp              ⁢                                                ∑                  i                                                                                        ⁢                                                      λ                    i                                    ⁢                                                            f                      i                                        ⁡                                          (                                                                        x                          _                                                ,                                                  y                          ′                                                                    )                                                                                                                              (        1        )            where x is an input vector, y is an output, the ƒi are feature functions (indicator functions) that are true if a particular property of x, y is true, and λi is a trainable parameter (e.g., weight) for the feature function ƒi. For example, if trying to do word sense disambiguation for the word “bank”, x would be the context around an occurrence of the word; y would be a particular sense, e.g., financial or river; ƒi ( x,y) could be 1 if the context includes the word “money” and y is the financial sense; and λi would be a large positive number.