The volume of information has been growing at an exponential rate. Since 2003, new information generated annually exceeds the amount of information created in all previous years. Digital information now makes up more than 90% of all information produced, vastly exceeding data generated on paper and film. One of the greatest scientific and engineering challenges of the 21st century is to effectively understand and leverage this growing wealth of data. Computational processes are widely-used to analyze, understand, integrate, and transform data. For example, to understand trends in multi-dimensional data in a data warehouse, analysts generally go through an often time-consuming process of iteratively drilling down and rolling up through the different axes to find interesting ‘nuggets’ in the data. Often, to mine data, several algorithms are applied and results are compared, not only among different algorithms, but also among different configurations of a given algorithm. To build data warehouses and data marts that integrate data from disparate data sources within an enterprise, extraction, transformation, and loading (ETL) workflows need to be assembled to create consistent, accurate information. Additionally, to understand and to accurately model the behavior of environmental components, environmental scientists often need to create complex visualization dataflows to compare the visual representations of the actual behavior observed by sensors with the behavior predicted in simulations. Further, to improve the quality of a digital photo, a user may explore different combinations of filters. As a further example, to plan a radiation treatment, a radiation oncologist may create a large number of 3-dimensional (3-D) visualizations to find a visualization that clearly shows the lesion tissue that requires treatment.
Due to their exploratory nature, these tasks involve sometime large numbers of trial-and-error steps. In an exploratory process, users may need to select data and specify the algorithms and visualization techniques used to process and to analyze the data. The analysis specification is adjusted in an iterative process as the user generates, explores, and evaluates hypotheses associated with the information under study. To successfully analyze and validate various hypotheses, it is necessary to pose queries, correlate disparate data, and create insightful data products of both the simulated processes and observed phenomena. Before users can view and analyze results, they need to assemble and execute complex pipelines (dataflows) by selecting data sets, specifying a series of operations to be performed on the data, and creating an appropriate visual representation. As an additional factor that contributes to the complexity of these tasks, assembling the computational processes may require a combination of loosely-coupled resources, including specialized libraries, grid and Web services that may generate yet more data, adding to the overflow of information users need to process.
Workflows are emerging as a paradigm for representing and managing complex computations. Workflows can capture complex analysis processes at various levels of detail and capture the provenance information necessary for reproducibility, result publication, and result sharing among collaborators. Because of the formalism they provide and the automation they support, workflows have the potential to accelerate and to transform the information analysis process. Workflows are rapidly replacing primitive shell scripts as evidenced by the release of Automator by Apple®, Data Analysis Foundation by Microsoft®, and Scientific Data Analysis Solution by SGI®.
Often, insight comes from comparing the results of multiple visualizations created during the exploration process. For example, by applying a given visualization process to multiple datasets generated in different simulations; by varying the values of certain visualization parameters; or by applying different variations of a given process (e.g., which use different visualization algorithms) to a dataset, insight can be gained. The path from “data to insight” requires a laborious, trial-and-error process, where users assemble, iteratively modify, and execute complex workflows, which may include pipelines and/or dataflows.
In the course of exploratory studies, users often build large collections of workflows, which include, for example, different types of visualizations, each of which may help in the understanding of a different aspect of their data. For example, a user working on a new computational fluid dynamics application might need a collection of visualizations such as 3-dimensional (3-D) isosurface plots, 2-dimensional (2-D) plots with relevant quantitative information, and various direct volume rendering images. Although in general, each visualization is implemented in a separate workflow, there is a certain amount of overlap between the workflows. For example, each workflow may manipulate the same input dataset(s). Furthermore, for a particular class of visualizations, the users might generate several different versions of each individual workflow while fine tuning visualization parameters or experimenting with different data sets. Thus, constructing insightful visualizations is a laborious process that requires expertise in both visualization techniques as well as the domain of the data being explored. Therefore, what is needed is a method and a system for simplifying and semi-automating the construction of new visualizations to allow the rapid development of workflows and to reduce the need to understand both visualization techniques and the data domain.