An anomaly (also known as: outlier, novelty, noise, deviation, rare value or exception) can be defined as anything that differs from expectations. In computer science, anomaly detection refers to identifying data, events or conditions which do not conform to an expected pattern or to other items in a group. Encountering an anomaly may in some cases indicate a processing abnormality and thus may present a starting point for investigation. Traditionally, anomalies are detected by a human being studying a trace. A trace is a log of information that can come from an application, process, operating system, hardware component, and/or a network. Never an easy job, given the current complexity of today's computer systems, it is a job that is rapidly becoming close to impossible for a human.
Anomaly detection is classified as supervised, semi-supervised or unsupervised, based on the availability of reference data that acts as a baseline to define what is normal and what is an anomaly. Supervised anomaly detection typically involves training a classifier, based on a first type of data that is labeled “normal” and a second type of data that is labeled “abnormal”. Semi-supervised anomaly detection typically involves construction of a model representing normal behavior from one type of labeled data: either from data that is labeled normal or from data that is labeled abnormal but both types of labeled data are not provided. Unsupervised anomaly detection detects anomalies in data where data is not manually labeled by a human.