Pattern matching may be used in many big-data applications such as network security, machine learning, and genomics. One leading methodology for pattern matching is to use regular expressions or equivalent finite state machines to identify a pattern in a large dataset. A regular expression can be represented by a deterministic finite automata (DFA) or non-deterministic finite automata (NFA), which can be equivalent in computational power.
On CPUs, NFAs and DFAs may be represented by tables where each state's successor state(s) can be indicated in response to a rule match. DFAs may be the basis for implementing automata on CPUs, because they may have more predictable memory bandwidth requirements. An NFA may have many active states and may use many state lookups to process a single input symbol (which may require significant memory bandwidth), whereas a DFA may utilize a single state. On the other hand, because DFAs may exhibit an exponential increase in the number of states relative to NFAs, DFA tables may be too large to store in a processor cache.
Graphics processing units (GPUs) may provide more parallel resources, which may reduce DRAM access latency. However, highly-random access patterns in automata processing may exhibit poor memory locality, which may increase branch divergence and the need for synchronization. Therefore, off-the-shelf von Neumann architectures have struggled to meet the processing requirements of many of the big-data applications described above.
“Spatial” hardware accelerators have been employed in this area as the performance growth in conventional processors has slowed. Spatial hardware accelerators for automata processing, such as automata processors and Field Programmable Gate Arrays, can be used to layout reconfigurable hardware resources on a substrate based on rules or instructions provided to the FPGA, which may allow more patterns to searched in parallel.
The Micron Automata Processor developed by Micron Technologies (AP) and Cache Automata (CA) provide spatial automata acceleration using DRAM and SRAM arrays, respectively. Both of these spatial processors can allow native execution of non-deterministic finite automata (NFAs), an efficient computational model for regular expressions, and may process a new input symbol every cycle. In particular, the AP repurposes DRAM arrays for the state-matching and has a deeply hierarchical routing matrix whereas the CA re-purposes the last-level cache for the state-matching with 8T SRAM cells used for the interconnect. CA uses a full-crossbar topology for the interconnect to support full connectivity in an automaton.