As computing power and data storage grows, computational tools configured to ingest large amounts of source data through data pipelines have become increasingly common. This source data may include, for example, tabular input data, comprising multiple rows and columns of data elements.
Unfortunately, as source data is frequently retrieved from multiple sources, the data is often disorganized and may suffer from data quality issues as a result of formatting errors and human error. As a result, the computational tools are often unable to perform their intended functions effectively. Thus, methods exist to analyze and detect anomalies which may exist in the source data, with varying levels of usefulness.
For example, automated systems exist which monitor streams of source data and are configured to detect specific types of errors and inconsistencies within the data. However, these methods are not entirely effective, and are not capable of detecting incorrect values that may be correctly formatted. An alternative solution is to create custom tools for monitoring specific data streams. While these methods may prove effective, the process of creating and testing the tool itself often proves to be difficult and time consuming.