Electronic information across networks is a crucial aspect of enterprise or e-commerce systems. However, malicious or unauthorized use of these systems is on the rise, as evidenced by daily reports of breach and fraud, despite implementation of existing security systems.
Advanced persistent threats (APTs) which may target the exfiltration of critical data, typically comprise a series of steps including: infection, exploitation, command and control, lateral movement, and data exfiltration. The command and control phase, in which an attacker maintains a communication channel between an infected host inside the targeted organization and a remote server controlled by the attacker, may span weeks or months. However, despite its long duration, its detection in real-world organizations remains a great challenge. In fact, to further frustrate detection efforts, some attackers may not only minimize their footprint by combining active with stealthy phases, but also establish communication channels via unblocked services and protocols, therefore blending in with legitimate traffic. Since most organizations allow their employees to freely browse the Internet, web traffic is a very effective channel for attackers to communicate and maintain control over infected machines.
Descriptive studies show that, when analyzed over a period of several weeks, web-based command and control traffic patterns exhibit distinctive network profiles, with the frequency and network profile being dependent on the specific threat, or malware family involved in the attack. For example, infected machines may periodically attempt to communicate with the remote server(s), and may generally establish lightweight connections in which they receive new instructions. In a minor fraction of these connections, the infected machine will download a larger amount of data, corresponding to a software update.
However, most machine learning-based attempts to detect command and control focus on the analysis of individual connections. Given the large volume of data generated today at most organizations' perimeters and the number of entities that need to be monitored and analyzed, it is a great challenge to train models with behavioral patterns observed over weeks of data. In fact, depending on the organization size and activity, perimeter devices such as next generation firewalls may typically generate up to 1 TB of log data and involve tens of millions of entities on a daily basis.
As such, there is a need for improved cyber security system, and particular for security systems capable of handling large volumes of data, and detecting threat patterns exhibited over extended periods of time.