Estimating probability density functions over sets of random variables can be a central problem in machine learning. For some problems, such as conditional random fields (CRFs) and log-linear models, estimation often requires minimizing a partition function.
Conventional bound-majorization methods were once used to optimize parameters and train models, but were overtaken by faster first-order and then second order methods.