The field of the disclosure relates generally to network traffic analysis and, more specifically, to apparatus, methods, and systems for use in surveying the character sets used in network traffic.
Network traffic analyzers, sometimes referred to as deep-packet inspection systems, are sometimes used to scan network traffic on a computer network and capture traffic of interest. In some systems, one of the capture criteria is a keyword match. For instance, all network traffic with the term “dirty bomb” could be captured using a deep-packet inspection system.
However, network traffic need not, and often doesn't, contain only a single language. Moreover, there exist numerous character sets that are used to encode, or represent, characters in digital communication. For example, the Unicode Standard is a character coding system propagated by the Unicode Consortium and designed to support the worldwide interchange, processing, and display of the written texts. There are over 250 standard Unicode character sets. Different languages may be encoded using the same or different character sets. Accordingly, a key word entered in a first language may not be located in a network traffic data packet encoded with a different character set, whether or not the language is the same.