Database clusters are becoming increasingly larger and more complex. The horizontal expansion of computing component resources (e.g., more and more computing nodes, more and more storage-oriented devices, more and more communication paths between components, more and more processes, etc.) coupled with the proliferation of high-performance component instrumentation results in systems capable of generating extremely high bandwidth streams of sensed data. Even a session of very short duration to capture such sensed data can result in an accumulation of correspondingly large volumes of raw data which—even just considering the sheer volume—presents a huge challenge to system administrators to perceive the meaning within the volume of data.
Yet, within the raw data are measurements that can be used to determine the current health state of the measured system. In some cases the raw measurements can be used to predict a future health state (e.g., upcoming problem) of the measured system.
Legacy measurements have often included performance metrics that characterize resource utilization, workload statistics, event logs, etc. Unfortunately, legacy “hands-on” techniques, including trial-and-error techniques, are swamped with the amassed measurements (e.g., sensor data, other sensed data). Legacy methodologies have become inadequate in several aspects:                Legacy techniques don't have the capacity or the sophistication to discern the interrelationships of the measurements—even though such interrelationships comprise information critical to determination of the state of the measured system;        Legacy techniques depend on thresholds and other naïve statistical measures to make predictions, which often results in wrong predictions and/or missed predictions.        
Analysis of a system's health that is based on use of legacy techniques are, at best, marred by inaccuracies and shortcomings, and at worst are completely wrong and misleading.
What is needed in order to advance the art are techniques to model a database cluster so as to determine if and when the cluster's current behavior is normal or is anomalous, and/or whether the cluster's current behavior is predictably headed toward anomalous behavior, and/or whether the cluster's current behavior is predictably headed toward a stable or normal behavior. Still more, what is needed is to determine when an anomaly in the operational state of the cluster has occurred (or early warning that anomalous behavior is about to occur), in order to apply corrective action and/or to rectify the situation to avoid or remediate further anomalous behavior.
The aforementioned legacy technologies do not have the capabilities to generate database cluster health alerts using machine learning. There is a need for an improved approach.