Network traffic produced by a compute environment (whether from a container, VM, hardware switch, hypervisor or physical server) is captured by entities called sensors or capture agents that can be deployed in or inside different environments as mentioned herein. Sensors export data or metadata of the observed network activity to collection agents called “collectors.” Collectors can be a group of processes running on a single machine or a cluster of machines. For the sake of simplicity, all collectors are treated as one logical entity and referred to as one collector. In actual deployment on a datacenter scale, there will be more than just one collector, each responsible for handling export data from a group of sensors. Collectors are capable of doing preprocessing and analysis of the data collected from sensors. A collector is capable of sending the processed or unprocessed data to a cluster of processes responsible for analysis of network data. The entities which receive the data from collector can be a cluster of processes, which logical group can be referred to as a pipeline. Note that sensors and collectors are not limited to observing and processing just network data, but can also capture other system information like currently active processes, active file handles, socket handles, status of I/O devices, memory, etc.
There are deficiencies in the current use of collectors and sensors. Various techniques are used to hide the presence of malware and the network traffic generated by malware. Network traffic generated by a malware or any agent that wishes to send data out of the system can do so by placing itself at various levels in the Operating System (OS) stack. Data can be sent out by various means, some of which can bypass the OS stack altogether. For example, a compromised network device driver or firmware can send data out without being detected by the OS (either Guest or Host OS in a virtualized environment). Sometimes, a compromised device or service can generate and send packets which hide a process used by the malware which may otherwise indicate the system is compromised via the process). Network traffic can often be detected by analyzing packets on the wire or the physical medium. A packet analyzer system (hardware or software) placed on the wire can see all packets, but has no means for identifying a process in a packet if the process is hidden from the host OS, making it difficult to identify processes used or attacked by malware. Precisely, a packet analyzer cannot identify a hidden process when a portion of the flow associated with the process is hidden.