The present invention relates generally to data communication systems and more particularly relates to an apparatus for and a method of searching an input data stream for multiple strings in parallel.
Many data communications processing applications require that the input data stream be filtered so as to detect the presence of a set of strings. An example of such an application is a firewall. One of the functions of a firewall is to prevent the entry of packets whose criteria violate a set of packet filtering rules. The strings may comprise, for example, the keywords specified by a particular protocol. Different applications and protocols have different keywords or other types of strings associated with them. An application such as a firewall is typically required to recognize these strings in the input data stream. The detection of one or more particular strings may trigger certain subsequent events, such as discarding the packet if it violates a packet filtering rule, opening a session (upon the recognition of an FTP control message, for example), etc.
To meet this requirement, a plurality of strings must be searched for simultaneously. Thus, the search for all possible strings must be performed in parallel regardless of the number of possible strings.
The content search of an input data stream has many applications. One major application is in cellular telephones and other wireless type devices. In recent years, the world has witnessed explosive growth in the demand for wireless communications and it is predicted that this demand will increase in the future. There are already over 500 million users that subscribe to cellular telephone services and the number is continually increasing. Eventually, in the not too distant future the number of cellular subscribers will exceed the number of fixed line telephone installations.
Other related wireless technologies have experienced growth similar to that of cellular. For example, cordless telephony, two way radio trunking systems, paging (one way and two way), messaging, wireless local area networks (WLANs) and wireless local loops (WLLs). In addition, new broadband communication schemes are rapidly being deployed to provide users with increased bandwidth and faster access to the Internet. Broadband services such as xDSL, short-range high-speed wireless connections, high rate satellite downlink (and the uplink in some cases) are being offered to users in more and more locations.
Many of these technologies are able to connect users to public networks such as the Internet. It is desirable to be able to filter the downstream data in order to prevent hackers or other non-authorized users from accessing the communications device (e.g., Internet enabled cellular phone, PDA, etc.). This can be achieved by putting a firewall in each device to prevent malicious access to the device. As described above, the operation of a firewall requires recognizing a set of strings in the input data stream.
Therefore, there is a need for an effective and computationally efficient mechanism of simultaneously searching an input data stream for the presence of a plurality of strings.
The present invention provides a novel and useful apparatus for and method of searching an input character stream for multiple strings in parallel. The present invention is embodied in a content filter which is suitable for use in applications where an input data string is to be searched for the presence of one or more strings. For example, the content filter can be used in data communication systems to provide a real time search mechanism for searching the payload content of frames or packets in an input data stream for the presence of relevant strings of one or more communication protocols. The content filter is operative to simultaneously search for a given set of strings contained within the input stream. The resulting output comprises a list of the matching strings found.
The content filter is operative to search the input data stream for a plurality of strings simultaneously. The strings to be searched for are determined a priori, processed and stored in substring tables during a configuration phase of the content filter. During configuration, the strings to be searched for are divided into a plurality of two and three character substrings. The substring tables function to store the data structures representing these three and two character substrings. The hash of these substrings are generated and stored in hash tables used to provide an index into the substring tables.
During searching, the content filter generates the hash of the input character stream and attempts to find a matching substring stored in the table. Thus, hash functions are used both in configuring the filter and during the actual searching of the input data stream to determine whether a particular string is present in the data stream.
The location index and a time to live field are stored in a temporary register for each matching substring found. Subsequent matching substrings are checked with the existing set of temporary registers to determine if they have been received in correct consecutive order. For a substring to be considered in the correct consecutive order, a temporary register must be found whose index matched that of the previous index field of the newly found substring. In addition, the TTL field must be the proper value.
If the index and TTL fields verify correctly, it is then checked if the substring is the last in the string. If it is, the string is declared as found and the index of the last substring and its location in the input stream (e.g., location in the payload of a frame) is stored in a status register. Depending on the application, the information can then be forwarded to another module for further processing.
If the substring is not the last in the string, the index and TTL field of the substring are stored in a temporary register. The TTL field is decremented at each character clock cycle and if it reaches zero, the content of the corresponding temporary register is discarded. Thus, the content search processor guarantees that the substrings making up a string are in the correct consecutive order in order for a string to be declared as found.
The invention can be implemented in either hardware or software. In one embodiment, a computer comprising a processor, memory, etc. is operative to execute software adapted to perform the multiple simultaneous string search method of the present invention.
There is therefore provided in accordance with the present invention a method of searching for one or more strings in an input data stream, the method comprising the steps of generating a first hash value on three previous characters received in the input data stream and applying the first hash value to a first hash table, generating a second hash value on two previous characters received in the input data stream and applying the second hash value to a second hash table, retrieving a three character substring stored in a first string table in response to a hit on the first hash table, retrieving a two character substring stored in a second string table in response to a hit on the second hash table and searching for substrings making up a string and declaring a string found if all the one or more two or three character substrings of a particular string are found in correct consecutive order.
There is also provided in accordance with the present invention a method of searching for one or more strings in an input data stream, the method comprising the steps of dividing each string to be searched for into one or more substrings of three or two characters and storing the three or two character substrings in a string table in accordance with the hash function thereof, generating a first hash on three previous characters of the input data stream, generating a second hash on two previous characters of the input data stream, checking in the string table for valid three and two character substrings in accordance with the first hash and the second hash, respectively and declaring a string to be found if all two and three character substrings of a particular string are found in the correct consecutive order.
There is further provided in accordance with the present invention an apparatus for searching the content of an input character stream for the presence of one or more strings comprising a first string table for storing three character substrings of the strings to be searched, a first lookup mechanism for providing a first index to the first string table based on the hash of the previous three characters of the input stream, the first string table outputting a three character substring in accordance with the first index, a second string table for storing two character substrings of the strings to be searched, a second lookup mechanism for providing a second index to the second string table based on the hash of the previous two characters of the input stream, the second string table outputting a two character substring in accordance with the second index, a content search processor operative to declare a string found if all two and three character substrings of a particular string are found in the correct consecutive order.
There is also provided in accordance with the present invention a computer readable storage medium having a computer program embodied thereon for causing a suitably programmed system to search for a plurality of strings by performing the following steps when such program is executed on the system: generating a first hash value on three previous characters received in the input data stream and applying the first hash value to a first hash table, generating a second hash value on two previous characters received in the input data stream and applying the second hash value to a second hash table, retrieving a three character substring stored in a first string table in response to a hit on the first hash table, retrieving a two character substring stored in a second string table in response to a hit on the second hash table and searching for substrings making up a string and declaring a string found if all the one or more two or three character substrings of a particular string are found in correct consecutive order.