The explosive growth in the popularity of peer-to-peer (P2P) networks has created virtual communities of millions of people who communicate through the use of instant messaging, file transfer, as well as voice and video communications.
P2P networks include a plurality of P2P nodes that are, generally, peers. Peers make a portion of their resources, such as processing power, disk storage, or network bandwidth, directly available to other peers. In a pure P2P network, there is no need for central coordination, for example, by a central server. Peers may act both as suppliers of resources, or servers, and as consumers of resources, or clients. P2P nodes may be dynamically added or removed from P2P networks, and connections between P2P nodes are largely ad hoc. P2P networks are, generally, implemented as application-layer overlay networks over the physical-layer internet protocol (IP) network. Overlay networks allow indexing and peer discovery, while content is, typically, exchanged directly over the underlying IP network.
For added security, many P2P networks use encryption, and such networks are referred to as encrypted peer-to-peer (EP2P) networks. For example, the EP2P sessions carried on EP2P networks may be encrypted by randomizing portions of EP2P packets. Often, the EP2P sessions carried on EP2P networks are also obfuscated, for example, by inserting padding into EP2P packets. Therefore, EP2P networks pose substantial challenges to organizations tasked with detecting, intercepting, mapping, and blocking unauthorized communications, such as governments, corporate enterprises, intelligence organizations, lawful intercept entities, and censorship organizations.
With reference to FIG. 1A, a typical EP2P network 100 includes a plurality of EP2P nodes 101, 102, and 103 that are, generally, peers. Typically, the nodes 101, 102, and 103 include directory nodes 101, relay nodes 102, and general nodes 103. In some instances, the EP2P network 100 also includes a certificate authority or key server 104, which provides user authentication services. The directory nodes 101, which have listings of EP2P nodes 101, 102, and 102, route the EP2P sessions carried on the EP2P network 100, and the relay nodes 102 relay the EP2P sessions between the general nodes 103.
The EP2P sessions carried on the EP2P network 100 include EP2P packets having the same source IP address and port number combination, destination IP address and port number combination, and transport protocol. An exemplary user datagram protocol (UDP) EP2P packet 105 and an exemplary transmission control protocol (TCP) EP2P packet 106 are illustrated in FIG. 1B. In some instances, a TCP key exchange packet 107 is used in conjunction with the TCP EP2P packet 106.
It is, generally, difficult to detect EP2P sessions associated with EP2P networks. EP2P networks do not provide a static association between the IP address and port number combination of a client and the unique client identifier (ID). Users of EP2P networks are highly mobile and may use clients from various geographically dispersed locations, such as their homes, workplaces, or hotels. Consequently, it is not possible to detect EP2P sessions solely on the basis of IP address and port number combinations.
It is also difficult to detect EP2P sessions by pattern matching methods. For example, in the pattern matching method described in U.S. Pat. No. 7,646,728 to Fahmy, issued on Jan. 12, 2010, which is incorporated herein by reference, portions of P2P packets are compared to patterns associated with particular P2P networks. However, as the contents of EP2P packets are, typically, randomized through encryption, patterns cannot be matched without first decrypting EP2P packets.
EP2P sessions may be detected by traffic analysis methods. For example, as described in U.S. Patent Application Publication No. 2006/0068806 to Nam, et al., published on Mar. 30, 2006, and in U.S. Patent Application Publication No. 2010/0145912 to Li, et al., published on Jun. 10, 2010, which are incorporated herein by reference, connection patterns may be analyzed to detect EP2P sessions. However, traffic analysis methods, which do not consider the contents of EP2P packets, are prone to high false-positive ratios and are, typically, unable to detect EP2P sessions associated with a particular EP2P network.