Electronic computing has evolved from primitive, vacuum-tube-based computer systems, initially developed during the 1940s, to modern electronic computing systems in which large numbers of multi-processor computer systems, such as server computers, work stations, and other individual computing systems are networked together with large-capacity data-storage devices and other electronic devices to produce geographically distributed computing systems with hundreds of thousands, millions, or more components that provide enormous computational bandwidths and data-storage capacities. These large, distributed computing systems are made possible by advances in computer networking, distributed operating systems and applications, data-storage appliances, computer hardware, and software technologies.
Because distributed computing systems have an enormous number of computational resources, various management systems have been developed to collect performance information about these resources, and based on the information, detect performance problems and generate alerts when a performance problem occurs. For example, a typical management system may collect hundreds of thousands of streams of metric data to monitor various computational resources of a data center infrastructure. Each data point of a stream of metric data may represent an amount of the resource in use at a point in time. However, the enormous number of metric data streams received by a management system makes it impossible for information technology (“IT”) administrators to manually monitor the metrics, detect performance issues, and respond in real time. Failure to respond in real time to performance problems can interrupt computer services and have enormous cost implications for data center tenants, such as when a tenant's server applications stop running or fail to timely respond to client requests.
Typical management systems use reactive monitoring to generate an alert when metric data of a corresponding resource violates a usage limit. Although reactive monitoring techniques are useful for identifying current performance problems, reactive monitoring techniques have scalability limitations and force IT administrators to react immediately to performance problems that have already started to impact the performance of computational resources or are imminent. For example, by the time an IT administrator has been alerted by a management system that metric data for memory usage of a server computer has violated a usage limit, applications, VMs and containers running on the server computer may have already stopped running or slowed significantly. As a result, the IT administrator has to immediately execute remedial measures, which is error prone and may only temporarily address the performance problem. IT administrators seek management systems that identify performance problems in advance so that IT administrators have sufficient time to assess the problems and implement appropriate remedial measures that avoid future interruptions in computational services.