To monitor computer traffic of a specific individual, law enforcement agencies have traditionally obtained court orders authorizing them to collect or “tap” an individual's computer traffic. In this situation, the law enforcement agency would obtain authorization to monitor computer traffic at a particular user's Internet Service Provider (ISP) where traffic speeds are relatively slow (e.g. 150 megabits per second). The user's computer traffic is typically collected based on the user's Internet Protocol (IP) address.
However, in today's mobile society people can connect to the Internet in places such as cafes, airports, and libraries without using their ISP. In such situations a user's IP address cannot be used to identify the user's computer traffic. Instead, collection monitors must be placed on high speed Internet backbones where traffic of unknown origin is likely to be seen. To address these situations other techniques are needed to identify computer traffic associated with specific users regardless of where the user connects to the Internet.
One such approach is to identify user names which are buried within the various application protocols. This approach can be effective because user names tend to remain constant regardless of where an individual connects to the Internet. For example, email traffic such as Simple Mail Transport Protocol (SMTP) contains the email addresses of the sender and the recipients. To collect traffic associated with a particular user, these email addresses can be extracted and compared to a directory of email addresses associated with particular users who are “of interest.” If there is a match between the user's identity and an identifier in the directory, that user's traffic can be collected for further review. In addition, traffic associated with application protocols such as the File Transfer Protocol (FTP) include user login names, webmail contains both login names and email addresses, and Internet Messaging (IM) contains user “handles.”
However, these user identification and extraction techniques suffer from certain drawbacks. For example, the identification and extraction of user identifiers is processing intensive since computer traffic on the Internet can utilize a wide range of application protocols. For example, computer traffic may utilize application protocols such as email, webmail, and chat, among others. Unfortunately, each application protocol requires unique processing techniques to identify and extract user identifiers. In the context of the Internet, collection monitors should be capable of inspecting traffic from millions of users. However, due to the high data rates of Internet traffic, the processing element in the collection monitor can quickly become overwhelmed. As such, it has been unfeasible to create a processing element capable of running at data rates of 2.5 gigabits per second and above.
In the context of traffic monitoring, user identification and extraction techniques have typically been implemented via application layer processing in collection monitoring software. The performance of collection monitoring software is not sufficient at Internet speeds where traffic from hundreds of thousands, if not millions of computers is present. As a result, software-based collection monitors are now mainly used on Enterprise networks that run at speeds much slower than the Internet.
Collection monitors which implement hardware-based protocol decoders, such as application specific integrated circuits (ASIC) and field programmable gate arrays (FPGA), can theoretically perform protocol decoding much faster than software-based collection monitors. However, in the context of monitoring computer traffic on the Internet, such hardware-based protocol decoders can be difficult and expensive to implement due to the number of different application protocols. Because each application protocol typically requires a unique processing engine, a hardware-based protocol decoder would need to be capable of handling all desired application protocols.
Tag scanning techniques have been implemented in hybrid hardware-software collection monitoring solutions. Special high-speed hardware filters can detect combinations of characters that indicate the presence of user identifiers. For example, the character string “From:” might indicate the presence of a source email address. Software can then be used to execute extraction rules for isolating an email address which can then be compared to a user directory. Tag scanning techniques do not require processing of the protocol stack which improves performance. However, tags tend to appear often on high-speed networks and the extraction rules tend to be unique for each protocol. As such, the software component of these hybrid hardware-software solutions tends to limit performance of conventional tag scanning techniques because of the high rate of tag matches and the complexity of the extraction rules.
String searching techniques for identifying users look directly for user identifiers to extract the user identifiers without concern for the protocol or tags. For example, a string match of “bad.boy@nowhere.com” will identify email either to or from the owner of the email address, regardless of the protocol that it appears in (e.g. SMTP, POP3, IMAP, etc.). However, string searching techniques require a high-speed string search engine capable of checking for user IDs at every byte position within the packet data. Each check requires accessing a table of user IDs which must be done with the arrival of each byte of data. Networks operating at 2.4 Gigabits/sec require table lookups at the rate of 300 million per second which can only be achieved with redundant parallel processing engines. As such, string searching techniques for identifying users also tend to be performance limited.
There is a need for techniques which can identify specific user IDs which are buried within the various application protocols regardless of the application protocol being used without checking for the user IDs at every byte position within the packet data. Other desirable features and characteristics of the present invention will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and the foregoing technical field and background.