Technological innovations have increased the capability for collecting and retaining large amounts of electronic information that may be accessed and used for various applications such as educational, scientific, commercial and entertainment applications. As the amount of electronic content continues to increase, it becomes increasingly important to implement automated data analysis tools to allow individuals to organize and utilize such data. For example, data mining methods can be employed for automatically processing a large corpus of data to determine useful data associations and patterns with the large data corpus. Moreover, large enterprises may employ automated business intelligence systems to analyze various types of business data, e.g., e.g. weekly sales figures, revenues outstanding by region, etc., relevant to the particular business. However, existing solutions for interpreting information typically explain the analytic process rather than the meaning of the data involved in the analytic process and the resulting output of the data analysis in the context of a particular domain, such that the data analysis results are often difficult to interpret by a non-technical audience.
For example, although data mining methods can determine associations and patterns of data within a large corpus of data, such methods simply provide a means of discovering previously unknown knowledge in a data set but do not address the question of how to explain that discovery to a non-technical audience. The “translation” of the analytic output into natural language that reflects the context of the problem domain typically requires the assistance of a skilled analyst, thus limiting the ability of the larger population to analyze and interpret information on-demand. Moreover, automated business intelligence systems typically produce canned and ad-hoc reports based primarily on simple summarization of underlying data values, e.g. weekly sales figures, revenues outstanding by region, etc, while is left to the reader of the report to examine the summary values and determine what they imply. More complex analytic procedures, such as applying a statistical test for the presence of a true downward trend in weekly sales are often not employed, as the results of such statistical tests are difficult to convey in an automated manner. Other conventional data processing methods include automated processes for translating rule evaluation results into natural language but only for the small set of data used in its rules, but do not provide a general mechanism for explaining computations unrelated to the purpose of the rule execution.
Predictive Model Markup Language (PMML) is an XML-based language which provides representation of the data mining models so as to enable the exchange of the standard data mining models based on the standard data mining techniques such as Association. PMML language provides a general purpose language for describing statistical and data mining models but does not provide any mechanism to explain the results of applying those models. A need therefore exists for improved systems and methods that provide a general domain-independent method of explaining analytical computations of a process in a human-understandable form, which overcome the problems associated with conventional methods.