The present invention relates to scanning signatures in a string field.
A digital content entity (e.g., a file, a program, a web page, an email, an IP package, or a digital image) can include one or more string fields. A string field is a string of data values that typically stand for characters or execution codes. For example an IP packet can include URL, HOST, HTTP header, HTTP payload, email attachments, email header, and email payload fields. The size of a string field can vary from a few bytes to a few million bytes or more. A string signature is either a particular fully-specified sequence of data values or a particular expression (e.g., a particular regular expression) of data values identifying a string object (e.g., a particular computer virus or a specific genetic sequence). String signatures can be stored in a string signature database. The string signature database can include multiple string signatures. The size of a single string signature can vary from a few bytes to thousands of bytes.
Both string signatures and string field are bit strings that can include many basic units. A basic unit is a smallest unit having a semantic meaning, and is therefore used as a scanning unit in conventional string signature scan techniques. A size of the basic unit can vary with application. For example, a basic unit of English text strings is typically 8 bits (i.e., one byte) while a basic unit of a computer virus signature is typically a byte or a half byte.
Each basic unit in a particular string signature can be specified as equal or unequal to a specific value, or a range of values (e.g., a numerical character or an alphabetic character can have a specific value or a range of values such as 0-9 or a-z). The basic unit can be specified to be either case-sensitive or case-insensitive. The string signature can support simple logic operations (e.g., negation). Furthermore, each string signature can include a wildcard designated by, for example, a “*” (a “variable-size” symbol) or “?” (a fixed-size symbol), where “*” indicates zero or more arbitrary basic units and “?” indicates a single arbitrary basic unit. For each variable-size symbol, a range of arbitrary basic units can be further specified. When a string signature includes the variable-size symbol, the size of the string signature is variable. If the string signature does not include a variable-size symbol, the size of the string signature is fixed.
A typical signature scan process can include comparing a string field against corresponding string signatures in a database for all possible locations within the string field. The scan speed is typically limited by signature size and complexity. In addition, scan speed can be limited by the ability to update the signatures incrementally.