Classical statistical methods and data mining have often been viewed as two competing methodologies for drawing conclusions from data. Classical statistics relies on stochastic models and hypothesis testing, whereas data mining makes no assumptions and is data-driven. Statistical methods offer established diagnostics in well-defined contexts. Data mining is particularly well-suited for exploratory data analysis and model creation using massive, high-dimensional datasets that may not be compiled using rigorous statistical experimental design, such as data residing in an information warehouse. Data mining is heuristic and algorithmically driven, designed to extract useful patterns automatically. However, because patterns are found automatically, data mining may find patterns that appear interesting but do not represent significantly different behaviors or outcomes.
Three problems exist in this context. First, making a high-cost or high-value decision may require a rigorous interpretation of data mining results. In addition, the data mining model may need to be customized and/or optimized further for a specific application. Second, the trend toward embedding data mining in business applications requires some mechanism to help a business analyst interpret the results correctly. Third, as data mining becomes more operational, the growing need for automating the data mining process requires that the embedded interpretation be more robust and reliable.
Currently, no solution exists that addresses all three of these issues. Commercially available data mining workbenches and other data mining solutions may include certain statistical functionality for ad hoc operations such as variable selection, data exploration, data preparation, etc. But these implementations of statistical functionality do not constitute a specific methodology to combine statistical techniques with data mining for addressing the issues described above. Accordingly, a need exists for more robust analysis of data mining results.