The present disclosure relates generally to data preparation and analysis. More particularly, techniques are disclosed for profiling data sets and providing visualizations of profiling using a visual-interactive interface in a client application.
Before “big data” systems can analyze data to provide useful results, the data needs to be added to the big data system and formatted such that it can be analyzed. This data onboarding presents a challenge for current cloud and “big data” systems. Typically, data being added to a big data system is noisy (e.g., the data is formatted incorrectly, erroneous, outdated, includes duplicates, etc.). When the data is analyzed (e.g., for reporting, predictive modeling, etc.) the poor signal to noise ratio of the data means the results are not useful. As a result, current solutions require substantial manual processes to clean and curate the data and/or the analyzed results. However, these manual processes cannot scale. As the amount of data being added and analyzed increases, the manual processes become impossible to implement.
Certain embodiments of the present invention address these and other problems.