Information Technology (IT) systems are becoming increasingly distributed and are often comprised of multiple micro-services running in parallel. Thus, monitoring the performance of such IT systems has become increasingly challenging. Human-assisted machine learning (ML) solutions are being deployed to monitor and analyze the behavior of such IT systems and associated software applications.
Traditional ML solutions, however, typically provide naïve models, in the sense that they often take advantage of the data only in its raw form, often with sparse feature values and other data quality issues, potentially resulting in unnecessarily complicated ML models. In addition, the monitoring solution itself may become difficult to track and maintain.
A need therefore exists for improved techniques for monitoring the performance of IT systems and other monitored systems. A further need exists for improved monitoring techniques that provide actionable insights that enable monitoring personnel to take action with a reduced latency.