The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed technology.
Modern applications run on distributed computer systems over complex architectures where component and system statuses are monitored by collecting at regular intervals performance metrics such as CPU, memory, disk and network usage, and system service level agreements (SLAs). Further, the advent of cloud computing and online services has led to exponential growth in size and complexity of data centers. This has created unprecedented challenges for system management and monitoring. Given the scale and scope of such large data centers, network operators and monitoring tools are overwhelmed with monitoring and analyzing performance metrics across several thousand network layers and network elements. Currently, network operators and monitoring tools conduct much of the forensic examination when anomalous behaviors have already occurred by examining protocols or log files of past or recent running processes of the affected devices or applications.
It is therefore necessary to automate identification of system behavior changes that are reflected in the performance metrics of various network entities, so as to allow operators to take timely actions that maintain the service level agreements for the data centers. An opportunity arises to increase automation in network monitoring environments. Improved user experience and engagement and higher customer satisfaction and retention may result.