Data visualization applications enable a user to understand a data set visually, including distribution, trends, outliers, and other factors that are important to making business decisions. Some data sets are very large or complex, and include many data fields. Various tools can be used to help understand and analyze the data, including dashboards that have multiple data visualizations. In many cases, it is difficult to build data visualizations using raw data sources. For example, there may be errors in the raw data, there may be missing data, the data may be structured poorly, or data may be encoded in a non-standard way. Because of this, data analysts commonly use data preparation tools (e.g., an ETL tool), to transform raw data from one or more data sources into a format that is more useful for reporting and building data visualizations. These tools generally build a data flow that specifies how to transform the data one step at a time.
While executing a data flow, errors can be detected. These errors can occur because of problems in the raw data, and errors can be introduced by the data flow itself (e.g., specifying a join improperly). Although some tools may be able to identify error conditions, the tools rarely provide enough useful information for a user to understand the error and resolve the root cause. This is particularly problematic when dealing with large and/or complex data sets, or for a data flow that is complex. For example, an error condition may be detected many steps after the root error actually occurred.