In many applications of statistical learning the objective is not simply to construct an accurate predictive model, but rather to discover meaningful interactions among the variables. This is particularly important in biological applications such as, for example, reverse-engineering of gene regulatory networks, or reconstruction of brain-activation patterns from functional MRI (fMRI) data. Probabilistic graphical models such as Markov networks (or Markov Random Fields) provide a principled way of modeling multivariate data distributions that is both predictive and interpretable.
A conventional approach to learning Markov network structure is to choose the simplest model, i.e. the sparsest network that adequately explains the data. Formally, this leads to a regularized maximum-likelihood problem with the penalty on the number of parameters, or l0 norm, a generally intractable problem that was often solved approximately by greedy search (See D. Heckerman. A tutorial on learning Bayesian networks, Tech.Report MSR-TR-95-06. Microsoft Research, 1995, which is hereby incorporated by reference in its entirety). Recently, even better approximation methods were suggested that exploit sparsity-enforcing property of l1-norm regularization and yield convex optimization problems that can be solved efficiently (See N. Meinshausen and P. Buhlmann. High dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34(3):1436-1462, 2006; M. Wainwright, P. Ravikumar, and J. Lafferty. High-Dimensional Graphical Model Selection Using l1-Regularized Logistic Regression. In NIPS 19, pages 1465-1472. 2007; M. Yuan and Y. Lin. Model Selection and Estimation in the Gaussian Graphical Model. Biometrika, 94(1):19-35, 2007; O. Banerjee, L. El Ghaoui, and A. d'Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning Research,9:485-516, March 2008; and J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 2007, which are hereby incorporated by reference in their entireties). However, those approaches are known to be sensitive to the choice of the regularization parameter, i.e. the weight on l1-penalty, and selection of the regularization parameter still remains a difficult task.