Network architectures for observing and capturing information about network traffic in a datacenter are described herein. Network traffic from a compute environment (whether from a container, VM, hardware switch, hypervisor or physical server) is captured by entities called sensors or capture agents that can be deployed in or inside different environments. Sensors export data or metadata of the observed network activity to collection agents called “collectors.” Collectors can be a group of processes running on a single machine or a cluster of machines. For the sake of simplicity, collectors can be treated as one logical entity and referred to as one collector. In actual deployment on the datacenter scale, there will be more than just one collector, each responsible for handling export data from a group of sensors. Collectors are capable of doing preprocessing and analysis of the data collected from sensors. The collector is capable of sending the processed or unprocessed data to a cluster of processes responsible for analysis of network data. The entities which receive the data from the collector can be a cluster of processes, and this logical group can be considered or referred to as a “pipeline.” Note that sensors and collectors are not limited to observing and processing just network data, but can also capture other system information like currently active processes, active file handles, socket handles, status of I/O devices, memory, etc.
A host in a datacenter may at some point interact through a packet flow with a malware infected host or become infected itself. The infection can be very damaging to data, hardware, software and/or privacy. What is needed is an improved ability to determine whether a host has been infected with malware.