Computers and computer-based devices have become a necessary tool for many applications throughout the world. Typewriters and slide rules have become obsolete in light of keyboards coupled with sophisticated word-processing applications and calculators that include advanced mathematical functions/capabilities. Thus, trending applications, analysis applications, and other applications that previously may have required a collection of mathematicians or other high-priced specialists to painstakingly complete by hand can now be accomplished through use of computer technology. For instance, due to ever-increasing processor and memory capabilities, if data is entered properly into an application/wizard, such application/wizard can automatically output a response nearly instantaneously (in comparison to hours or days generating such response by hand previously required).
Furthermore, through utilization of computers and computer-related devices, vast magnitudes of data can be obtained for analysis and predictive purposes. For example, a retail sales establishment can employ a data analysis application to track sales of a particular good given a particular type of customer, income level of customers, a time of year, advertising strategy, and the like. More particularly, patterns within collected structured data can be determined and analyzed, and predictions relating to future events can be generated based upon these patterns. While the above example describes utilizing data in connection with retail sales, it is understood that various applications and contexts can benefit from analysis of accumulated data.
The aforementioned analysis of data, recognition of patterns, and generation of predictions based at least in part upon the recognized patterns can be collectively referred to as data mining. Conventionally, to enable suitable data mining, various models must be programmed and trained by way of training data. For instance, data previously collected can be employed as training data for one or more data mining models. The data mining models can employ various decision tree structures to assist in generating predictions, and can further utilize suitable clustering algorithms to cluster data analyzed by the data mining models. Accordingly, these data mining models can be extremely complex and require significant programming from an expert computer programmer.
Due to complexity of data mining models and extensiveness of computations utilized in connection with such data mining models, there currently exist various deficiencies associated therewith. For example, conventional systems are primarily directed to mining structured (e.g., relational) data. However, unstructured data is oftentimes available in the form of text documents, for example, related to Internet web pages and the like. Similarly, unstructured data can refer to image data as well as streaming audio and/or video. Today, developments in the field of mining unstructured data require extensive pre-processing and/or tokenizing of the data external to the data mining tool.