Mixture models are common tools of statistical analysis and machine learning. For example, when trying to model a statistical data distribution, a single Gaussian model may not adequately approximate the data, particularly when the data has multiple modes or clusters (e.g., has more than one peak).
As such, a common approach is to use a mixture of two or more Gaussian components, fitted with a maximum likelihood, to model such data. Nevertheless, even a mixture of Gaussians (MOG) presents modeling problems, such as inadequate modeling of outliers and severe overfitting. For example, there are singularities in the likelihood function arising from the collapse of components onto individual data points—a pathological result.
Some problems with a pure MOG can be elegantly addressed by adopting a Bayesian framework to marginalize over the model parameters with respect to appropriate priors. The resulting Bayesian model likelihood can then be maximized with respect to the number of Gaussian components in the mixture, if the goal is model selection, or combined with a prior over the number of the components, if the goal is model averaging. One benefit to a Bayesian approach using a mixture of Gaussians is the elimination of maximum likelihood singularities, although it still lacks robustness to outliers. In addition, in the Bayesian model selection context, the presence of outliers or other departures from the empirical distribution of Gaussianity can lead to errors in the determination of the number of clusters in the data.