Probabilistic mixture modeling is an important machine learning technique that has been extensively used for the tasks of density modeling and clustering. For clustering, individual mixture components represent the clusters. Mixture modeling is generally used for clustering data, such as media data, documents, signal data, scientific observations or measurements, etc. The Expectation-Maximization (EM) algorithm is among the most popular methods that are used for this task. The EM algorithm iteratively updates a model estimate until convergence. In practice, an iteration of the EM algorithm for mixture model clustering includes an E-step which, given a current model estimate, calculates cluster-membership probabilities for each data item in order to construct sufficient statistics, followed by an M-step which generates a new model estimate from those statistics. Each E-step has a computational complexity of O(N*C), where N is the number of data cases (samples) and C is the number of mixture components (or clusters) in the model. For very large N and C, for example, Internet-scale data, the computational complexity of the EM algorithm can be prohibitive. Put another way, the EM algorithm does not scale well in N and does not scale well in C.
Techniques related to efficient variational EM mixture modeling are discussed below.