Bayesian Networks are becoming an increasingly important area for research and application in the fields of Artificial Intelligence, Medical Sciences, Bio Informatics and other inference tasks in Engineering. The main reason of popularity is due to the fact that classical inferential models do not permit the introduction of prior knowledge into the calculations as easily as in the Bayesian approach. The task of training a Bayesian Network from a training dataset involves trying to learn a model that best represents the underlying distribution or relationships in the training data. The ultimate goal of the Bayesian Network is to be able to perform correct inference on new data. Since the available training data is never complete in the real world, the trained model is usually only an approximation of the actual underlying function.
Cross validation is a method commonly used to estimate how accurately a trained Bayesian Network (or any learning model) will perform in practice. The reasoning behind using a cross validation technique is as follows: The training of a Bayesian model optimizes the model parameters to make it fit the training data as close as possible. Given an independent sample of validation data from the same population of the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. This is particularly likely to happen when the size of the training data set is small, or when the number of parameters in the model is large. Cross validation is a way to estimate how good the model will perform with future data.