U.S. Pat. No. 7,882,217, whose disclosure is incorporated herein by reference, describes a computer-implemented method for communication analysis. The method includes monitoring communication sessions, which are conducted by entities in a communication network. Identifiers that identify the entities are extracted from the monitored sessions. The identifiers extracted from the sessions are grouped in respective identity clusters, each identity cluster identifying a respective entity. A subset of the identity clusters, which includes identifiers that identify a target entity, is merged to form a merged identity cluster that identifies the target entity. An activity of the target entity in the communication network is tracked using the merged identity cluster.
U.S. Pat. No. 8,665,728, whose disclosure is incorporated herein by reference, describes methods and systems for identifying network users who communicate with the network (e.g., the Internet) via a given network connection. The disclosed techniques analyze traffic that flows in the network to determine, for example, whether the given network connection serves a single individual or multiple individuals, a single computer or multiple computers. A Profiling System (PS) acquires copies of data traffic that flow through network connections that connect computers to the WAN. The PS analyzes the acquired data, attempting to identify individuals who login to servers.
Bellovin, Steven M. “A technique for counting NATted hosts,” Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, ACM, 2002, which is incorporated herein by reference, describes a technique for detecting NATs and counting the number of active hosts behind them. The technique is based on the observation that on many operating systems, the IP header's ID field is a simple counter. By suitable processing of trace data, packets emanating from individual machines can be isolated, and the number of machines determined.
Bursztein, Elie, “Time has something to tell us about network address translation,” Proc. of NordSec. 2007, which is incorporated herein by reference, describes a new technique to count the number of hosts behind a NAT. This technique is based on the TCP timestamp, and works with Linux and BSD systems.
Gokcen, Yasemin, and Vahid Aghaei Foroushani, “Can we identify NAT behavior by analyzing Traffic Flows?” Security and Privacy Workshops (SPW), 2014, IEEE, 2014, which is incorporated herein by reference, describes a machine learning (ML) approach to identifying malicious behaviors using only network flows. The proposed approach is evaluated on different traffic data sets against passive fingerprinting approaches.
Gokcen, Yasemin, “A preliminary study for identifying NAT traffic using machine learning,” (2014), which is incorporated herein by reference, describes identifying the presence of NAT devices and (if possible) predicting the number of users behind those NAT devices. Gokcen utilizes different approaches and evaluates the performance of these approaches under different network environments represented by the availability of different data fields. To achieve this, Gokcen proposes a machine learning (ML) based approach to detect NAT devices.