As computing devices have become ubiquitous, the volume of data produced by such computing devices has continuously increased. Organizations often wish to obtain insights about their processes, products, etc., based upon data generated by numerous data sources, wherein such data from the data sources may have different formats. To allow for these insights to be extracted from data, the data must first be “cleaned” such that a client application (such as an application that is configured to generate visualizations of the data) can consume and produce abstractions over the data.
Currently, data is often serialized into a tree-structured document, such as JSON, XML, etc. Often, an organization will employ an individual, referred to herein as a “data cleaner”, to extract data encoded in tree-structured documents and place such data in a format (e.g., tabular) that can be consumed by certain applications for processing. Utilizing conventional approaches, the data cleaner can write a customized script that receives the tree-structured document as input, extracts data from the tree-structured document, and constructs a table based upon the extracted data (e.g., where at least some of the data extracted from the tree-structured document may be further processed prior to a cell in a table being populated with a value). Writing a script, particularly when the tree-structured document is not in a relatively simple format and/or when somewhat complex processing is to be undertaken on data extracted from the tree-structured document, can be cumbersome and requires programming expertise. Therefore, it can be ascertained that extracting data encoded in a tree-structured document and creating a table based upon the extracted data can be labor-intensive.