An analyst may want to monitor a set of users or a set of online identities associated with the set of users to observe their behavior online. The analyst may already be monitoring a set of known users of the communication network. The analyst may have previously identified a set of known users based on a set of identifier characteristics. The analyst may also be interested in locating other new users or new online identities of interest based on a set of desired identifier characteristic. The analyst may be interested in finding other users based on a set of desired characteristics or a type of project or application he may be working on. The analyst may be interested in understanding and studying a whole set of online behavior associated with the user or online identity associated with a person of interest. The analyst may want to study a set of communication and transaction data created and exchanged between a known user of interest of the communication network and a new user of interest. The analyst may want to collect a set of data belonging to the user of interest and/or a new user of interest to study a particular pattern in behavior. The analyst may also want to understand a set of interactions between the user of interest and a user not currently of interest or a new user of interest. Similarly, the analyst may seek content and metadata communicated between users on a wide variety of communication systems and formats. This information can be useful for determining commercial, investment, and personal information and relationships between the users or online identities and persons at large.
A network monitoring system may be required to monitor a set of activity between users and/or online identities associated with a set of persons. Some users of a communication system may be easily identifiable. However, other users of a communication system may be of interest, but may not yet be identified or provisioned easily. These users and/or online identities may be difficult to locate, and analysts may have a difficult time finding links between existing online identities and other potential online identities manually. Finding links between known users and new users or online identities related to the known user users may be time-consuming and inefficient task. In addition to being cumbersome and inefficient, it may also be financially expensive to identify new users of interest manually.
Such systems of network monitoring can be very expensive to purchase or lease due to the high development and design costs required for the sophisticated algorithms and software as well as the high-performance hardware, server infrastructure and other system features. There may be multiple analysts who may be part of various organizations, or agencies that may each want to monitor their own list of users that are of interest to them. These analysts from different agencies may want to use their own individual management protocol, judgment, and techniques for network monitoring and data gathering. However, sometimes these different agencies might be tracking the same users of interest and retrieving the same collection and transaction data without knowing it. But because each agency, organization or analyst has to maintain confidentiality of its work, the agency may typically have to have its own monitoring system. However, it can be very expensive for each individual agency, many with a limited budget, to purchase and maintain a system by itself. Sometimes purchasing or maintain a network monitoring system can be cost prohibitive, resulting in inferior, or severely handicapped, monitoring, collecting, and/or analyzing of data, Additionally, multiple different network carriers sometimes use the same backbone fiber routes to communicate data. If a network monitoring system has to be purchased for each carrier, then there might be duplicative resources tapping along the same route.
Identifying a user of interest on a communication system may be a starting point for collecting data on a network, like searching for the user of interest via chat handle, user name, etc. However, a subsequent task of accurate and timely collection of asymmetric data that is somehow associated with the user's communication may be more challenging. Asymmetric data may refer to many different types of related data, such as any type of data sent by a network user in addition to a given communication. Asymmetric data may refer to any communication related to any original or primary communication, but that is asymmetric in terms of: a time communicated on the network, a source providing the data onto the network, a route chosen on network to send the data, an application type and protocol used to package and format data in packet sent on the network, and/or any changes to any of these settings on subsequent communications between any parties of interest that substantially complicate the task of gathering a communication between a requestor and a responder, a communication application, or any combination thereof and assembling it together into a meaningful and holistic package of what was communicated to whom and when, in its entirety. Thus, for example, asymmetric data can be an email attachment sent by a network user, e.g., a target, along with her email that in reality is actually sent on the network from the email provider's server farm most likely along a different network route than the user's email to which the attachment was attached and at a slightly different time from when the email was sent.
Asymmetric data can also include other web and non-web services used by the target user of a network and/or her responder, such as the Voice over Internet Protocol (VOIP), audio chat, video chat, file transfer protocol (FTP), photo sharing sites, collaborative remote PC screen sharing apps, online Web services, etc., in any combination and permutation and being used in parallel or at different times. Asymmetric data can include any of the communications mentioned above, that are sent via different networks, e.g., a public wireless fidelity network (WiFi) access point, a user's Internet Service Provider (ISP), a neighbor's unsecured wireless network, etc. Asymmetric data can also include a responder's communications back to the target, in any of the web and/or non-web services mentioned above, typically sent via a route that is different from the target/requestor's route, for purposes of security, e.g., as a standard security measure by the ISP in trying to avoid the capture of the two-way conversation from a single line tap. In short, the accurate and timely collection of asymmetric data of unknown content that might exist in some indeterminate routing and timing in one or more networks that route terabytes of data for millions of users can be a daunting task.
Furthermore, with many users of a network anonymously hidden behind a proxy server, e.g., a library, hotel, corporation, university, coffee shop, internet café, etc., it is difficult to correctly identify which data traffic belongs to a given target from the total aggregated traffic from the population of users behind a given proxy that is transmitted from the proxy IP address, to a network. With the complexity and proliferation of data communicated by a person using modern communication devices, hundreds or thousands of sessions with potential data to be gathered and analyzed can occur at any given point in time. However, because tens or thousands of users may exist behind a proxy, each with different and changing four tuples (e.g., a unique virtual combination of: destination IP, source IP, destination port number, source port number), the session information becomes indeterminate in finding and tracking communications of a given user initially or over time. Adding to this difficult problem are several challenging factors including: the compression and/or encryption of traffic data which removes some of the packet header information otherwise useful in determining identity of sender(s) and recipient(s); the use of one or more firewalls and their use of Network Address Translation (NAT) that reformats otherwise readily apparent and useful IP address into information to cannot accurately identify a given user; the use of encryption, e.g., 128-bit or Secure Socket Layer (SSL), in key communications such as authentication of user, use of tokens, etc. that contain user-specific information that again would otherwise be useful in an intercept system trying to locate a specific user; ever changing network routing due to the traffic variability, unpredictability, and intentional security protections.
If a method or apparatus used to collect information of a target on a network is inaccurate or inefficient, it can result in over-collection of information, e.g., gathering and processing unwanted information beyond the communication and asymmetric data belonging to the target and user to whom he/she is communicating. Over-collection may raise privacy issues in some jurisdictions, and in all jurisdictions it may result in a potential overload with an extreme case of crashing the monitoring system.
Tapping a line close to an actual target network user may be sufficient to capture a communication along with its asymmetric data, regardless of the source, route, timing, etc. occurring out on the network, because all the information to and from a user becomes increasingly deterministic the closer the tap is placed to the actual user, e.g, tapping an actual single line going into the residence is most accurate. However this method can be unacceptable because it is expensive, labor-intensive, time-consuming, manual, potentially harmful re: loss of evanescent evidence due to delay, insufficient for mobile applications and mobile users, not easily scaled for future use and large networks and countries, and because of its other inadequacies.