String matching algorithms are widely used in the areas of network security, encryption, business analytics, processing of scripting and markup languages, search engines, and as a component of software compilers and interpreters. As a result of their prevalent use, a considerable proportion of available computing power in numerous situations is devoted to performing these algorithms.
In string matching algorithms, one larger string is searched for occurrences of another smaller string therein. The larger string that is searched is frequently referred to as a “sequence” while the smaller string (that the larger string is searched for occurrences of) is frequently referred to as a “pattern.” Both of these strings may be made up of characters, symbols representing information such as DNA sequence elements, or any of a variety of types of data. In essence, a string is a one-dimensional array of data elements, one for each position in the array. The set of the possible values for each element in the array is frequently referred to as an “alphabet.”
Over time, numerous variants of string matching algorithms have been devised. Among more recent variants are bit-parallel string matching algorithms employing bit values and bitmasks to represent occurrences of particular data values at each position in the pattern and/or the sequence. Many of these bit-parallel variants achieve considerable efficiency where the length of the pattern (i.e., the number of positions in the one-dimensional array making up the pattern) is less than or equal to the number of bits in one or more registers of a processor. These bit-parallel variants may still be used where the length of the pattern is greater than the number of bits in the registers of a processor, but this results in the need to create data structures in memory to provide the equivalent of a wider processor register.
Processors with registers of 32 or 64 bits in width have long been commonplace, and present wide enough registers to efficiently accommodate bit-parallel string matching algorithms employed for many purposes. Further, recent advances in processor architecture have enabled the introduction of processors with 128 bit, 256 bit and 512 bit registers, thus potentially accommodating ever larger patterns with considerable efficiency. However, given that typical pieces of numerical data often require no more than 64 bits to be represented, registers of greater widths tend to be subdivided into two or more lanes of 64 bits in width or less to enable multiple data values to be held side-by-side. The instruction sets of such processors are also augmented with instructions that enable simultaneous execution of bitwise logic, arithmetic and other instructions on those side-by-side values in parallel. Such registers and instructions are often referred to as “vector registers” and “vector instructions,” respectively. Further, processor architectures implementing vector registers with vector instructions are referred to as SIMD (single-instruction-multiple-data) architectures.
One outgrowth of the manner in which such wide registers are subdivided and the manner in which the instruction sets to support their use are implemented is a tendency to provide support only for bit shift operations in which a bit value at one or both ends of a lane within one of these registers is lost. This one implementation detail presents an obstacle to using the full width of such very wide registers in supporting longer patterns. It is with respect to these and other considerations that the embodiments described herein are needed.