Data assets monitoring is a critical data management and information technology (IT) function often used by Enterprises and Cloud Services Providers, which involves watching the activities occurring on an internal network for problems related to performance, reliability, misbehaving hosts, suspicious user activity, etc.
Anomaly detection is the identification of items, events or behavior which differs from an expected, desired or normal pattern. When studied in the context of data consumers, anomalous behavior detection mechanisms must be capable of distinguishing unusual behavior patterns caused by regular operations such as data backup to a remote storage device and behavior patterns caused by the presence of malicious actors performing sensitive data hoarding, scanning, snooping, and legitimate user impersonation.
A 2014 study by Intel Security estimates global economy losses due to cybercrime between $375 and $575 Billion and indicates a significant growth trend in the cybercrime industry. Cybercrime affects private businesses, global corporations, individuals, government and military organizations. Sophos estimates that in 2013 more than 800 million individual data records were compromised.
In order to reduce or eliminate losses from cybercrime operations, anomalous activities triggered by malicious actors must be detected and reported to IT security personnel in a timely manner.
However, data user anomalous behavior detection becomes exceptionally difficult when the number of data users and data assets under observation increases, and the complexity of each observed item or event also increase. Detecting anomalous behavior of data users is an extreme example of a complex anomaly detection problem.
Traditionally, detection of anomalous events attributed to data users was in the domain of network security analysts. Typically, a security analyst possesses a collection of tools accumulated over the years while investigating security incidents. A large majority of those investigative tools are suitable for forensic investigations that take place after a security incident has been discovered. However, by the time of discovery cybercriminals may have already accomplished their objectives and retrieved valuable information from the victim's data assets.
Due to the vast amount of data, the data arrival rate and the number of observed parameters that may be relevant, only machine-learning-based methods are capable of handling user behavior anomaly detection tasks. Machine learning methods capable of providing timely alerting of anomalous events may be classified into two groups: unsupervised machine learning methods and supervised machine learning methods.
Unsupervised machine learning methods operate on “raw” data and do not require input from an expert. Being automatic, unsupervised machine learning methods suffer from a high rate of false positives.
Supervised machine learning assumes a-priori knowledge of the universe of discourse and is based on expert information as a foundation of the learning process. While being more precise in its findings, supervised machine learning methods require a significant knowledge base and thus are less adaptive to the changes in the universe of discourse than unsupervised machine learning methods.
Accordingly, improvements are needed in systems for anomaly detection in order to identify anomalous events in a networked environment in real time and alert operators to a breach-in-progress condition.