With the proliferation of social software and platforms, there has been an increase in the number of malicious anomalies, such as insider information leakage, spreading of unwelcome email, rumor dissemination, and planning of inappropriate actions that might raise concerns for law enforcement authorities. Detecting such anomalies is important in many applications. For example, in social media, anomaly detection may provide insight on whether people are propagating truthful or deceptive information. As another example, in organizations, detecting anomalous groups may help to identify poorly performing or malicious personnel, such that organizations may better improve performances and protect themselves against insider threat. Anomaly detection may also help to identify good anomalies, such as innovators who behave differently from the majority of “normal” personnel. Other anomaly detection applications may include inappropriate actions that might raise concerns for law enforcement authorities, network intrusion, engine fault, disease symptoms, and epidemic detection and prevention. In some instances, a sequence of events may seem normal individually, yet appear abnormal only when considered collectively. For example, there may be events showing stress, which may lead to downloading confidential information, which may then lead to the leaking of confidential information to an adversary (abnormal behavior or an anomaly). Or consider a case, for example, where an insider logs into a system late at night, downloads files from an infrequently used server, and copies large amounts of data to a USB drive. Individually, these events may be normal, but when viewed together may be malicious. It may be critical to detect an anomalous sequence before it can have a negative impact.
Existing anomaly detection approaches are mostly based on pre-defined rules and/or pre-labeled instances of anomalies. One-class learning refers to an anomaly detection technique, and uses training data collected from only one known class to predict whether or not a new sample is drawn from the same distribution.