Computers and computer-based devices have become a necessary tool for many applications throughout the world. Typewriters and slide rules have become obsolete in light of keyboards coupled with sophisticated word-processing applications and calculators that include advanced mathematical functions/capabilities. Thus, trending applications, analysis applications, and other applications that previously may have required a collection of mathematicians or other high-priced specialists to painstakingly complete by hand can now be accomplished through use of computer technology. For instance, due to ever-increasing processor and memory capabilities, if data is entered properly into an application/wizard, such application/wizard can automatically output a response nearly instantaneously (in comparison to hours or days generating such response by hand previously required).
Furthermore, through utilization of computers and computer-related devices, vast magnitudes of data can be obtained for analysis and predictive purposes. For example; a retail sales establishment can employ a data analysis application to track sales of a particular good given a particular type of customer, income level of customers, a time of year, advertising strategy, and the like. More particularly, patterns within collected data can be determined and analyzed, and predictions relating to future events can be generated based upon these patterns. While the above example describes utilizing data in connection with retail sales, it is understood that various applications and contexts can benefit from analysis of accumulated data.
The aforementioned analysis of data, recognition of patterns, and generation of predictions based at least in part upon the recognized patterns can be collectively referred to as data mining. Conventionally, to enable suitable data mining, various models must be programmed and trained by way of training data. For instance, data previously collected can be employed as training data for one or more data mining models. The data mining models can employ various decision tree structures to assist in generating predictions, and can further utilize suitable clustering algorithms to cluster data analyzed by the data mining models. Accordingly, these data mining models can be extremely complex and require significant programming from an expert computer programmer.
Due to complexity of data mining models and extensiveness of computations utilized in connection with such data mining models, there currently exists various deficiencies associated therewith. For example, once data mining models are created and applied to a particular context, it can be extremely difficult to alter such data mining models. In particular, disparate data mining models can be created to generate predictions relating to particular contexts and/or applications, where at the time of creation of such data mining models it was believed that the models were not substantially related. Over time, however, it can be determined that, in fact, the disparate models are substantially related, and therefore it is desirable to utilize an output of one model as an input for a second model (e.g., data output from one data mining model can be utilized as input data and/or training data for a second data mining model). Utilizing conventional systems and/or methodologies, enabling an output of one data mining model to be employed as an input to a second data mining model requires significant custom programming as well as a substantial amount of time.
Another deficiency associated with data mining applications is that data mining models often need a significant amount of training data to operate properly. For instance, a new customer at a retail sales establishment will not be associated with data relating to such establishment. Therefore, data mining applications have difficulty in providing predictions or other relevant information to assist the customer or the retail establishment in recommending items. Therefore, developers of the data mining applications/models must write extensive code for special instances where no data is associated with a subject of a data mining model. An alternative conventional approach is to generate a static rule for all cases where insufficient training data exists—for instance, an individual may be utilizing a web-based retail establishment for a first time. Often, such establishments utilize virtual “shopping baskets” and recommend items to be placed within the basket by way of data mining model(s). A static rule can dictate that no recommendations are to be provided to customers who have not previously viewed and/or purchased items. Such rules, however, are inflexible despite user context and global statistics.
Moreover, in conventional data mining models/applications, users and/or developers are forced to specify a type of output. Particularly, for example, if it is desired that a time-series prediction be generated with respect to a data mining model, then a function specific to that prediction must be designated in order to obtain such prediction. Human error with respect to selecting a proper function can cause a user to be inconvenienced and/or a mining model to fail.
Conventional data mining models and/or applications are also quite expensive in terms of network usage as well as processing power usage with respect to clients utilizing a data mining application resident upon a server. For example, training a data mining model occurs on a server, where particular patterns are recognized. Predictions are generated by mapping input data with respect to existent recognized patterns, which are typically housed upon a server. Clients generally wish to review visualizations of patterns to determine results of the training. Conventionally, this visualization occurs by delivering an entirety of mining model content (e.g., pattern content) from the server to the client. The client can then analyze such content and generate a graphical display of results of the analysis. This retrieval and analysis of data is expensive, as client computers typically are not associated with processing power and memory of servers. Furthermore, networks can be subject to substantial traffic when a significant amount of data is retrieved from a server.
Accordingly, there exists a need in the art for systems and/or methodologies for improving usage and development of data mining models/applications.