The field of the disclosure relates generally to methods for optimizing the processing and graphical display of data. More specifically, the present disclosure relates to systems and methods to enable rapid visualization of very large datasets with minimal computing capability.
Extremely large datasets are generated as a result of many technological activities. For example, component testing for a large set of components and test cases may result in a massive dataset containing millions or even billions of rows. Sensor data from a large set of machines in an industrial location may encompass hundreds of variables and over time may reach terabytes of data. Moreover, industrial data is typically heterogeneous, with a mixed collection of numbers and strings and several missing or corrupted data points. Analysts spend a significant portion of their time cleaning and preprocessing datasets before they can even be displayed. Additionally, displaying large datasets (such as on a computer display) becomes difficult or even impossible if the dataset is too large, because the computing power is insufficient to properly render a large amount of data or to quickly determine what subset of data the user selected for display, out of the larger mass. It can be similarly cumbersome to determine what portions of data are missing and where data is erroneous.
As such, it may be difficult to simply load and review a large dataset without distributing subsections of the data across multiple processors and employing complex parallel processing techniques. Such techniques also suffer from a lack of synchronicity and coordination between the various processors, primarily because there may not be a scalable algorithm that can exhibit massive parallelism on the scale required. Importantly, users frequently can review only one screen's worth of visual data at a time, yet computer processors process the entire dataset at all times, thereby starving the processor of needed power to render the data that is requested for the screen. Even for data that is on the screen, known processors slow down due to a methodology that requires them to process and render every pixel, whether it is a data point or not. The non-data point pixels are irrelevant to the user, but rendering them consumes processing power that is valuable when rendering massive datasets.