Data wrangling is the process of converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data. Such consumption may include further wrangling, data visualization, data aggregation, and training a statistical model, as well as many other potential uses. Data wrangling sometimes follows a set of general steps which begin with extracting the data in a raw form from the data source, “wrangling” the raw data using various hardware and/software modules, parsing the data into predefined data structures, and depositing the resulting structured content into an accessible database for storage and future use.
Data wrangling is typically performed on large datasets and may be performed using various operations executable by different types of execution engines. However, when these types of operations are performed on small data sets, the operations run more slowly. Furthermore, if a user designs a set of wrangling operations, different execution engines may implement the designed set of wrangling operations differently. In other words, while one execution engine may be suitable for large datasets and a second execution may be suitable for smaller datasets, there is no guarantee that the behavior of the first and second execution engines will be equal. Thus, the output of the first and second execution engines can be different even if the requested data wrangling operations are the same.