The present invention relates generally to data analysis, and more specifically, automatically enumerating data analysis options for building statistical models from a given dataset and rapidly analyzing the statistical models.
Exploratory Data Analysis (EDA) is a data analysis approach. In EDA, a given dataset (i.e., a collection of data) is analyzed to build statistic models, and the models are examined to draw useful conclusions or insights about the dataset. Hence, EDA is different from model fitting or hypothesis testing in the sense that the data analyst is exploring the dataset to discover insights from the dataset. EDA involves slicing and dicing a given dataset, creating different types of models to gain insights through the examination of the created models. Useful results are found through the observation of anomalies, outliers, relationships, dependencies, correlations or other interesting patterns in the models.
Since its introduction, EDA has been prevalent across a wide variety of domains such as cyber security, online consumer behavior analysis, healthcare, system failure analysis, to name a few. For instance, the analysis of cyber monitoring data allows for identifying malicious hosts or predicting threats in a network. Analyzing the consumer behavior through user action logs, i.e., browsing histories, search terms, clicks, etc., often helps the analyst in characterizing consumer preferences. In healthcare, the similarity of a patient's data to relevant cases in the past may be an early indicator of a need for further investigation and diagnosis.