In a 1948 paper, Shannon considered the formation of text as a stochastic process. He suggested to learn the probabilities governing this stochastic process by computing the histograms of occurrences and co-occurrences of letters from a sample text. Subsequently he validated the accuracy of the generated model by sampling new texts from the estimated stochastic model. The successive integration of higher order terms (occurrence of letter triplets rather than pairs etc.) provides for the emergence of increasingly familiar structures in the synthesized text.
In the context of images, similar approaches have been proposed in the Markov random field literature. Going back at least as far as Abend's work, K. Abend, T. Harley, and L. N. Kanal. Classification of binary random patterns. IEEE Transactions on Information Theory, 11:538-544, 1965, Markov random fields have endured a sustained interest in the vision community. Besag applied them in the context of binary image restoration. See, J. Besag. On the statistical analysis of dirty pictures. J. Roy. Statist. Soc., Ser. B., 48(3):259-302, 1986. Derin analyzed texture in the context of a Markov random field using learned priors. See, H. Derin and H. Elliott. Modeling and segmentation of noisy and textured images using Gibbs random fields. IEEE PAMI, 9(1):39-55, January 1987. Work has continued through new applications such as texture segmentation or through extension of the basic model, for example by considering higher-order cliques. See, B. S. Manjunath and R. Chellappa. Unsupervised texture segmentation using Markov random field models. IEEE PAMI, 13(5):478482, May 1991 and W. Pieczynski, D. Benboudjema, and P. Lanchantin. Statistical image segmentation using triplet Markov fields. In S. B. Serpico, editor, SPIE Int. Symposium on Image and Signal Processing for Remote Sensing VIII, volume 4885, pages 92-101. SPIE, March 2003, respectively.
However, the major computational challenge arising in the application of Markov random fields lies in determining global optima of functionsE:{0,1}n→R  (1)over a large set of binary-valued variables {x1, . . . , xn}. The optimization of functions of binary-valued variables has a long tradition, going back to work of Ising on ferro-magnetism. See, E. Ising. Beitrag zur Theorie des Ferromagnetismus; Zeitschrift f{dot over ( )}ur Physik, 23:253-258, 1925. Numerous methods have been proposed to tackle these combinatorial optimization problems. Geman and Geman showed that the method of Simulated Annealing is guaranteed to find the global optimum of a given function. See, S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE PAMI, 6(6):721-741, 1984, S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671-680, 1983, and N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. J. Chem. Physics, 21:1087-1092, 1953. Unfortunately, general purpose optimization methods such as Simulated Annealing require exponential runtime and can be quite slow for the number of nodes considered in most realistic applications. In contrast, deterministic or approximation algorithms are not guaranteed to find a global optimum.
A key challenge addressed herein is to devise methods to efficiently impose statistically learned knowledge in such combinatorial optimization problems. New and improved optimization schemes to learn prior information while maintaining graph representability in image segmentation are required.