Some embodiments of the present disclosure are directed to an improved approach for implementing analyzing database cluster state and behavior by transforming large volumes of unrefined raw sensory data and big collections of diverse overall-system snapshots data into a few, but critically indicative, time series signals and model.
The disclosure relates generally to techniques applicable in the database cluster environments and more particularly to techniques for refining and processing large and diverse volumes of raw overall-system samples and sensory data. Legacy approaches fail to process the massive amounts of time-base measurements into a manageable set of state-insinuating signals.
Modern database clusters are massively configured, having hundreds or thousands of processors and even more shared resources. They are extremely complex, and capable of executing trillions of instructions per second. Any one or more processors may need access to a shared resource (e.g., a device, a semaphore, a communication bus, etc.), and the processor (hardware paradigm) or processes (software paradigm) may need to enter a wait state before gaining access to the shared resource. Researchers have attempted to observe the behavior of processes within these complex systems by taking a series of time-sampled measurements at multiple test points (e.g., service measurements) in the cluster. Such service measurements frequently include sampling of running processes so as to collect and timestamp events (e.g., events and wait state data) for every active session or process in the cluster. This results in the accumulation of large and diverse volumes of data with important implications for system health state, which range from expanded performance metrics to internal and external resource utilization to workload statistics and to detailed processes logs.
While legacy solutions can perform rudimentary filtering and display of a series of time-stamped event data, these legacy solutions are unable to match the amplified levels of sensory data generated in large database clusters. Legacy solutions fail in many regards, and their main shortcomings can be attributed to their inadequacies to discern meaningful information buried inside immense and diverse raw data; insight and knowledge are obfuscated. Researchers need to see or infer information from the data. Legacy solutions suffer myriad dramatic shortcomings in their ability to process the data to foster development of human understanding and inference. For example, legacy solutions suffer many shortcomings due to their reliance on naïve algorithms (e.g., simple threshold techniques, which could suffer from high false alarm rates and/or high occurrences of alarms that are missed) and/or failure to recognize and respond to the dynamic changes in the target system behavior buried inside the raw sensory data. This could result in various misleading or obfuscating events ranging from presentation of wrong or misleading data, to the generation of inaccurate results, to failure to present of critical information antecedent to insight.
Such shortcomings of legacy techniques are further exacerbated in the context of modern database clusters which are extremely complex and can span large cluster systems capable of producing billions of raw measurements per second. This has become too cumbersome to be processed by human labor.
Thus legacy techniques fail to provide anything more than an impenetrable mountain of raw data, leaving researchers unable to perceive and discern the changing states, performance bottlenecks, and nature of service availability of the entire cluster system. What's needed are techniques to transform raw measurements into various forms of time series that are conducive to applying a robust learning model for the corresponding target system. The needed time series format and model are to be used to predict a system's availability and health state.
Moreover, the aforementioned technologies often fail to identify critical information. Worse, legacy techniques produce inaccurate information, and/or wrong information in their conclusions regarding the observed cluster system states. Therefore, there is a need for an improved approach.