The field of the disclosure relates generally to methods for predicting unknown or missing data in a large dataset, and forecasting future data points for the dataset. More specifically, the present disclosure relates to systems and methods to detect and populate blank/null data points in a dataset, and further extrapolate from the dataset without the use of massive computing capacity.
In large distributed systems, such as industrial systems, accurately measuring the behavior of equipment is critical. For several activities such as design improvements, performance management, and condition-based maintenance, it is vital that there be consistent monitoring of the equipment leading to a robust and complete record of events (e.g., in the form of a dataset). Moreover, having an accurate complete operational history is essential for predicting future equipment behavior. In practice, operational data is seldom complete and is often missing in large quantities depending on the specific variable of interest, equipment, operator, location, etc. Unfortunately, there are cases where some data entries or even entire rows or columns are missing. This adversely affects the ability to build predictive models and predict/update existing models. Missing data arises frequently where, for example, a timestamp exists for a particular event but event data is not available, or where event data is available but there is no timestamp, preventing a user from properly placing the event timeline. Known methods are also limited in that, frequently, specific variables cannot be estimated in isolation. Moreover, known methods are unable to predict future data for an equipment system without at least some a priori knowledge of the past behavior of the equipment.