The explosive growth in the popularity of peer-to-peer (P2P) networks has created virtual communities of millions of people who communicate through the use of instant messaging, file transfer, as well as voice and video communications.
P2P networks include P2P nodes that are, generally, peers. Peers make a portion of their resources, such as processing power, disk storage, or network bandwidth, directly available to other peers. In a pure P2P network, there is no need for central coordination, for example, by a central server. Peers act as both suppliers of resources, or servers, and as consumers of resources, or clients. P2P nodes may be dynamically added or removed from P2P networks, and connections between P2P nodes are largely ad hoc. P2P networks are generally implemented as application-layer overlay networks over the physical-layer internet protocol (IP) network. Overlay networks allow indexing and peer discovery, while content is, typically, exchanged directly over the underlying IP network.
For added security, many P2P networks use encryption, and such networks are referred to as encrypted peer-to-peer (EP2P) networks. For example, the EP2P sessions carried on EP2P networks may be encrypted by randomizing portions of EP2P packets. Often the EP2P sessions carried on EP2P networks are also obfuscated, for example, by inserting padding into EP2P packets. Therefore, EP2P networks pose substantial challenges to organizations tasked with detecting, intercepting, mapping, and blocking unauthorized communications, such as governments, corporate enterprises, intelligence organizations, lawful intercept entities, and censorship organizations.
With reference to FIG. 1A, a typical EP2P network 100 includes a plurality of EP2P nodes 101, 102, and 103 that are, generally, peers. Typically, the nodes 101, 102, and 103 include directory nodes 101, relay nodes 102, and general nodes 103. In some instances, the EP2P network 100 also includes a certificate authority or key server 104, which provides user authentication services. The directory nodes 101, which have listings of EP2P nodes 101, 102, and 102, route the EP2P sessions carried on the EP2P network 100, and the relay nodes 102 relay the EP2P sessions between the general nodes 103.
The EP2P sessions carried on the EP2P network 100 include EP2P packets having the same source IP address and port number combination, destination IP address and port number combination, and transport protocol. An exemplary user datagram protocol (UDP) EP2P packet 105 and an exemplary transmission control protocol (TCP) EP2P packet 106 are illustrated in FIG. 1B. In some instances, a TCP key exchange packet 107 is used in conjunction with the TCP EP2P packet 106.
It is, generally, difficult to discover EP2P nodes associated with EP2P networks and, therefore, to map EP2P networks. EP2P networks do not provide a static association between the IP address and port number combination of a client and the unique client identifier (ID). Users of EP2P networks are highly mobile and may use clients from various geographically dispersed locations, such as their homes, workplaces, or hotels. The resulting changes in IP address and port number complicate the discovery of EP2P nodes.
Several solutions have been proposed for discovering P2P nodes. However, these solutions may not be readily applicable to the discovery of EP2P nodes associated with a particular EP2P network. As described in U.S. Patent Application Publication No. 2010/0064362 to Materna, et al., published on Mar. 11, 2010, which is incorporated herein by reference, P2P voice over internet protocol (VoIP) nodes may be discovered by scanning VoIP-specific ports of IP addresses and by detecting a VoIP service at the IP addresses. As described in U.S. Patent Application Publication No. 2009/0299937 to Lazovsky et al., published on Dec. 3, 2009, which is incorporated herein by reference, P2P nodes may be discovered by searching for a file ID in their shared storage. As described in U.S. Pat. No. 7,958,250 to Sridhar, et al., issued on Jun. 7, 2011, which is incorporated herein by reference, P2P nodes may be discovered by analyzing connection configuration data for a known P2P node.