Data deduplication, or data duplicate elimination, refers to the reduction of redundant data in a memory device to thereby reduce capacity cost of the memory device. In data deduplication, a data file is partitioned into one or more chunks, or blocks, of data. By associating a plurality of the blocks of data consisting of identical data with a single stored block of data, duplicate copies of the blocks of data may be reduced or eliminated by a computer memory, thereby decreasing the number of redundant copies of data in the memory device.
Accordingly, if duplicated copies of data can be reduced to a single copy of the data, the overall available capacity of the memory device is increased while using the same amount of physical resources. Because the resultant economization of the memory device allows for a reduction in a data rewrite count, and because write requests for duplicated blocks of data that are already stored in the memory may be discarded, a life span of a memory device that implements data deduplication can be prolonged by effectively increasing write endurance.
Conventional methods of data deduplication have most commonly been used for hard drives. However, there is interest in providing for fine grain deduplication in the area of volatile memory, such as dynamic random-access memory (DRAM).
In the field of data searching, a regular expression (e.g., a “regex” or “regexp” for short) is a special text string used to describe a search pattern to allow for certain patterns and groups of data to be found when searching. A regular expression (“regex”) operation may include substring matching and/or pattern matching. Accordingly, regex operations are widely used in many modern applications and in many domains, such as network security, text analytics, bioinformatics, and finance.
Problems of conventional solutions for software-based regex engines include low performance due to streaming methodology, and, between NFA vs DFA technologies, DFA is typically preferred due to its low complexity despite having relatively low performance. Further, it may be noted that DFA limits a state machine to be in single state at a given time, while NFA allows for parallel state exploration with the state machine progressing to multiple states at the same time. Additionally, problems of conventional solutions for hardware-based regex engine implementations are that they still have limited memory capacity, and that they need to stream data to find a regex match, while having high latency, high energy, and relatively low performance.
The above information disclosed in this Background section is only to enhance the understanding of the background of the invention, and therefore it may contain information that does not constitute prior art.