Data mining is a technique by which hidden patterns may be found in a group of data. True data mining doesn't just change the presentation of data, but actually discovers previously unknown relationships among the data. Data mining is typically implemented as software in or in association with database systems. Data mining includes several major steps. First, data mining models are generated based on one or more data analysis algorithms. Initially, the models are “untrained”, but are “trained” by processing training data and generating information that defines the model. The generated information is then deployed for use in data mining, for example, by providing predictions of future behavior or recommendations for actions to be taken based on specific past behavior.
A Data Mining System (DMS) examines data and constructs models that express predictions about subsequent data. The time and computation resources required to build these models increases with the size of the predictors in the transactional data set, i.e. the number of rows and attributes in the data. Relatively recently developed sources of data used for data mining, such as Internet click streams and Enterprise-wide data collection, produce vast quantities (rows) of data and contain very large numbers of attributes. This causes the time required to build models based on such data to be excessive and the computation resources needed to be very expensive. A need arises for a technique by which the time and computation resources required to build data mining models can be reduced, which would provide a corresponding reduction in the cost of data mining.