Data mining is a technique by which hidden patterns may be found in a group of data. True data mining doesn't just change the presentation of data, but actually discovers previously unknown relationships among the data. Data mining is typically implemented as software in or in association with database systems. Data mining includes several major steps. First, data mining models are generated based on one or more data analysis algorithms. Initially, the models are “untrained”, but are “trained” by processing training data and generating information that defines the model. The generated information is then deployed for use in data mining, for example, by providing predictions of future behavior based on specific past behavior.
Data mining typically involves the processing of large amounts of data, which consumes significant hardware resources. As a result, it is desirable to configure the data mining software system for efficient utilization of the hardware resources. This may present a problem. For example, if a data mining software system is configured to use all of the processors of a given hardware system, the data mining software system must either perform complex internal allocation of tasks to multiple threads/processes, or the data mining software system must allow the operating system to perform the allocation. If internal allocation is used, significant complexity is added to the data mining software system. This can cause difficulties in generating, debugging, and maintaining the data mining software system. If the operating system is used to perform allocation, the operating system will typically use a general-purpose allocation scheme. This general purpose allocation scheme cannot produce optimal usage of resources since data mining demands and behavior are significantly different than those that the typical general purpose allocation scheme has been designed to handle.
An additional problem may arise if, once a data mining processing task has started execution, the hardware system servicing the task becomes overloaded due to other tasks being executed. This may cause degradation in the performance of the data mining processing task, or, in some cases, cause the data mining processing task to become unexecutable. For example, if a data mining processing task requires a certain minimum number of processors to execute and the number of available processors is always fewer than that minimum, due to other tasks, the data mining processing task will never execute. This is unacceptable from a performance standpoint, since the typical data mining system expects a data mining processing task to run to completion in its current environment, without interruption.
A need arises for a technique by which data mining processing tasks may be allocated without complex internal schemes, yet resulting in better performance than is possible with general-purpose operating system based schemes.