Electronic information across networks is a crucial aspect of an enterprise or e-commerce system. However, such electronic information may expose these systems to security threats. Hackers are constantly changing their behavior by figuring out current rules and designing newer attacks that can sidestep detection.
In current technology, information security solutions generally fall into two categories: security analyst-driven and unsupervised machine learning-driven. Security analyst-driven solutions rely on rules determined by fraud and security experts, and exhibit high rates of undetected attacks. This solution also leads to delays between attack detection and implantation of preventative countermeasures. These delays are both costly and time-consuming for the enterprise or e-commerce systems.
Unsupervised machine learning-driven solutions can lead to detection of rare or anomalous patterns and may also lead to improved detection of new attacks. However, these solutions trigger more false positive alarms and alerts. These false positives require increased rates of substantial investigative efforts before they are dismissed.
Existing enterprises or e-commerce systems lack labeled threat examples from previous attacks, undercutting the ability to use supervised learning models. Due to the constant changing of an attacker's behavior, these models become irrelevant.
As a result, many enterprise and e-commerce systems using existing technology remain exposed to security threats, and improved security systems are needed to provide real time identification of threats.
Another challenge imposed by existing technology is resultant from malicious activities being extremely rare. Attack cases represent a minor fraction of total events, generally <0.1%. To illustrate this fact, FIG. 10 shows the ratio of reported malicious users to the total number of active users in the studied dataset.
The dearth of malicious activities results in extreme class imbalance when learning a supervised model, and increases the difficulty of the detection process. Not all malicious activities are systematically reported, either because their incident responses were inconclusive, or because they were not detected in the first place. This includes noise into the data, since unreported attacks will be considered legitimate activity. Attack vectors can take a wide variety of shapes. Even when malicious activities are reported, the users are not always aware of the specific vectors involved. Therefore, difficulty arises in developing robust defense strategies that are capable of detecting as many attacks as possible.
Importantly, there is a need for a method and system capable of detecting threats in real time, and collecting analysts' feedback to improve detection rates over time.
From such information gathering, there is a need for an active learning method that reduces the false positives for the detected threats.
There is, further, a need for a system that incorporates behavioral predictive analytics for network intrusion and internal threat detection.
Now, a method and system capable of addressing real-time security system threats may have application in a broad array of active learning and machine learning applications that are of value and benefit to the information system security professionals. Accordingly, the scope of the present disclosure extends beyond the collecting and detecting of threats.