The present disclosure relates to information technology (IT) systems, and more specifically, to methods, systems and computer program products for identifying unusual activity in information technology systems.
Today's complex IT systems, such as integrated data centers, require a team of experts to monitor various system messages for abnormal behavior, and to diagnose and fix anomalies before they result in systems failures and outages. These tasks are costly and difficult for many reasons, including the fact that a variety of everyday changes can cause system anomalies in the operation of the IT system. In typical complex IT systems, the number of status messages created by the components of the IT system far exceed what can reasonably be read and analyzed by the team of IT experts. As a result, automated systems have been developed for reviewing and filtering these status messages.
Currently available automated systems for reviewing status messages are configured by a domain expert that reviews a log of status messages that are grouped into time intervals. The interval data is then analyzed to build a statistical model that evaluates real-time status messages for potential anomalies in the IT system. In some cases, the domain expert manually determines which time intervals should be used in building the statistical model, this manual selection process is both error prone and expensive. In other cases, the statistical model is created based on the data from all of the time intervals.
These automated systems need to be extended to identify not only unusual intervals but intervals which need immediate attention. To do apply classic statistical methods like random forest or logistic regression to identify the intervals which need immediate attention requires an automated method to label the intervals that need attention. The current methods require that the domain expert label those intervals which is both costly and error prone.