Data mining or alternatively knowledge discovery relates to the process of exploring large quantities of data in order to discover meaningful information about the data that is generally in the form of relationships, patterns and rules. In this process, various forms of analysis can be employed to discern such patterns and rules in historical data for a given application or business scenario, and this information can then be stored as an abstract mathematical model of the historical data, referred to as a data-mining model (DMM). After the DMM is created, new data can be examined through or with respect to the model to determine if the data fits a desired pattern or rule. From this information, actions can be taken to improve results in many applications.
Various applications can benefit by employing data mining techniques. For instance, many organizations can be considered “data rich,” since they are collecting increasing volumes of data for business processes and resources. Typically, these volumes or data mountains are used to provide “facts and figures” such as “there are X categories of occupation,” or “this year's mortgage accounts in arrears” and so forth. However, merely having information at one's disposal does not necessarily represent knowledge but rather data to be further analyzed. Thus, it is patterns in the data that are more closely linked to knowledge than the actual data itself.
In many cases, data mining enables complex business processes to be understood and re-engineered. This can be achieved through the discovery of relationships or patterns in data relating to the past behavior of a business process. Such patterns can be utilized to improve the performance of a process by exploiting favorable and avoiding problematic patterns. Examples of business processes where data mining can be useful are customer response to mailing, lapsed insurance policies, energy consumption, sales prediction, product association, and risk assessment. In each of these examples, data mining can reveal what factors affect the outcome of the business event or process and the patterns relating the outcome to these factors. Such patterns increase understanding of these processes and therefore the ability to predict and affect the outcome.
In recent times, there has been some confusion among potential users of data mining as to which data mining technologies may apply. This confusion has been compounded by some technologies that claim to provide data mining tools when in reality the support is merely given to users to mine data manually for themselves. For instance, some vendors of query and reporting tools and OLAP (On-Line Analytical processing) tools promote that their products can be employed for data mining. While it is true that one can discover useful patterns in data using these tools, there is a question mark as to who or what is performing the discovery—the user or the tool. For example, query and reporting tools can interrogate data and report on any pattern (query) requested by the user. This is a manual and validation driven process of discovery in the sense that unless the user suspects a pattern they may never be able to determine it. A marginally better situation is encountered with the OLAP tools, which can be termed “visualization driven” since they assist the user in the process of pattern discovery by displaying multi-dimensional data graphically. The class of tools that can genuinely be termed “data mining tools” however are those that support automatic discovery of relationships and/or patterns in data.