Packet-based data networks continue to grow in importance, and it is often desirable to process network traffic associated with these packet-based networks within packet processing devices. The payload data within network packets, however, can include sensitive data such as personal information or other data that is desired to be detected. Identifying particular data and/or protecting personal information can be important. For example, entities can be required to protect personal information, such as identification information and medical history information, within packet processing systems that handle this personal information. Information technology (IT) professionals who monitor secure packet data networks where confidential sensitive information is communicated are typically required to follow privacy rules for handling such confidential sensitive information, such HIPAA (Health Insurance Portability and Accountability Act) privacy rules, PCI (Payment Card Industry) privacy rules, and/or other privacy related laws or regulations.
Various data masking, trimming, and/or other actions have been used in the past to address these data security and privacy needs. In one prior solution, sensitive data within network packets is replaced at fixed data offsets within the packets based upon an assumption that all packets will follow a specific protocol such that sensitive data will be located in the same bit positions in all packets. Data at these fixed offsets within received packets is then removed or obscured with a pre-determined code or other non-sensitive data. However, as network packet communication systems have become more complex, a variety of different packet protocols, packet formats, and/or packet sizes are often used within any given network communication system. Fixed offset solutions fail within such complex packet communication systems as sensitive data is not limited to specific fixed locations with all network packets.
To address this wide variety in network packet types, another prior solution is to completely parse received packets based upon the various protocols, formats, and sizes being used within the communication network. For this complete parsing solution, the system parses the entire packet to determine the particular protocol, format, and/or size being used. Once this determination is made, assumptions are further made with respect to the location of sensitive data for that particular protocol/format/size, and the sensitive data is masked at that location. In many cases, however, this complete parsing solution cannot be achieved in real-time at high-speed network communication line rates (e.g., 10 Gigabits per second (Gbs) or above) due to the complexity and variety of the protocols being used within the network communication system.
Rather than parse each network packet, another solution attempts to search the payload of each packet using regular expressions to find characters or digits for later masking. A regular expression, as used herein, is a sequence of characters that define search pattern strings for use in pattern matching for purposes of find-and-replace operations. This approach is flexible as it allows for different character strings to be defined and searched; however, this approach also suffers from an inability to process network packets in real-time at high-speed network communication line rates (e.g., 10 Gigabits per second (Gbs) or above). For example, pattern matching of regular expressions is typically processed using state machines and/or byte order processing engines that are relative slow as compared to high-speed network communicate line rates for current packet network communication systems.