The accelerating data avalanche is gaining unimpeded momentum that is enabled by the commoditization of computing storage, devices, bandwidth, connectivity, processor parallelization, and processor speed. Consequently, numerous data mining algorithms are becoming available to sift through massive amounts of information. Businesses and governments that do not embrace advanced data analytics will not survive within an environment of highly connected and intelligent enterprise.
Along with the advancement of data mining tools, applying the right algorithm to a problem is critical. For example, practitioners might choose a familiar algorithm for a specific problem that produces a suboptimal solution while a highly tuned system continually determines the best algorithm to apply towards a problem. Equally important, the diversity and dimensionality of data is becoming more challenging and is already intractable. Dimensionality reduction and variable selection is required to select the most important traits of data from an exhaustive set of features. However, varying algorithms will perform differently given changing feature sets. Accurately selecting an algorithm and a set of features is critical to achieve optimal performance.