Data mining is a technique by which hidden patterns may be found in a group of data. True data mining doesn't just change the presentation of data, but actually discovers previously unknown relationships among the data. Data mining is typically implemented as software in association with database systems. Data mining includes several major steps. First, data mining models are generated by based on one or more data analysis algorithms. Initially, the models are “untrained”, but are “trained” by processing training data and generating information that defines the model. The generated information is then deployed for use in data mining, for example, by providing predictions of future behavior or recommendations for actions to be taken based on specific past behavior.
One particularly useful type of data mining model is based on the Bayesian classification technique. Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given sample belongs to a particular class. Bayesian classification is based on Bayes theorem. Studies comparing classification algorithms have found a simple Bayesian classifier known as the naive Bayesian classifier to be comparable in performance with decision tree and neural network classifiers. Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases.
Users of a data mining predictive model benefit from knowing in advance how accurate a model's predictions will be. Cross-validation is one technique for measuring the accuracy of a predictive model. Leave-one-out cross-validation is an especially accurate special case of cross-validation, but it is ordinarily computationally expensive. Thus, a need arises for a technique by which leave-one-out cross-validation may be performed that provides a useful measure of the accuracy of a predictive model, but that provides reduced computational expense relative to conventional techniques.