The field of the invention relates generally to the aiding of domain experts to analyze data using data mining tasks, and more specifically, to methods and systems for template driven data mining task editing.
Domain experts often have in-depth knowledge about the data and the problem domain, but not about the data mining tools that they utilize. As such, it is a challenge for these domain experts to define exactly where data comes from, how the data can be extracted, what the best parameter settings are in order to use the data mining tool efficiently, how to specify a constraint in the tool's language, and how the discovered results should be processed.
Current data mining approaches require analysts to define data mining tasks from scratch. A simple copy-and-paste-and-modify approach may help reduce the task creation time, but the analysts are still required to understand the full specification of the task at hand. Often, the analysts have to repeatedly build the same, or a similar, specification for data sources and for result handling, as well as for some data/domain specific parameters.
As mentioned above, data mining tasks often require many different parameters to specify where data comes from, how data items are related, what constraints are used in the mining process, what types of domain knowledge are relevant, whether the user has special interest in some particular aspects, and how the discovered results are processed. Even though advanced data mining algorithms may be able to “self-tune” some controlling parameters, analyst entry of parameters (such as data source and result processing) is still necessary. In addition, controlling parameters might be tuned to different values for different application domains and a universal set of parameters that suit all purposes, all the time, does not exist.
For example, within a constraint-based mining of activity patterns (CMAP) system, tasks are created using, for example, an Eclipse based tool. This task creation process may involve an extensive knowledge about where data comes from, how each data item (table or predicate) is defined and interpreted, how data items can be used in the patterns, any domain knowledge, user interests or other constraints, and eventually, how discovered patterns are measured and processed. In this process, much of this information cannot be automatically deduced by the tool.
In summary, analysts may need to run data mining tasks on the same or similar data sets many times with slightly different parameter settings. Disadvantages and limitations of the existing solutions include that extensive and comprehensive knowledge of the data mining tool to accomplish the task is required and that users have to repeatedly specify parameters to run similar (or even the same) portion of mining tasks.