Techniques have been disclosed to detect anomalous behavior, for example, insider threats in an enterprise computer network—anomalous resource access/action behavior by users; financial fraud in banking system—anomalous bank account access behavior by customers or fraudsters; etc. One example technique is to analyze a temporal behavior matrix per user, e.g., via subspace learning such as principal component analysis, to model normal behavior, and the model will be used in the future to detect as anomalous behavior that departs from the historical behavior baseline.
Behavior detected as being anomalous may require investigation or other responsive action. In some cases, a behavioral modeling approach to anomaly detection as described above may generate too many alerts to be investigated in a timely and effective manner, and/or potentially too many “false positives”, i.e., identifying as anomalous behaviors that are not of concern, such as a user being observed to use for the first time a resource that is in a same group of resources as other resources the user has been observed to have accessed before.
Another challenge is the Big Data. A typical large enterprise has the scale of 100 Billions of events generated in the computer network per year. With such volume of data that cannot fit into one single machine for traditional in-memory analytics, we devise several algorithmic mechanism to parallelize the machine learning model training and scoring, in a parallel architecture such as MPP (Massively Parallel Processing), MR (Map Reduce).