The invention relates to the field of machine learning.
Text analytics, or the extraction of structured information from large amounts of unstructured data, is becoming an important aspect in today's enterprise. A common task in text analytics is ‘dictionary matching’ (DM), which is the detection of particular sets of words and patterns in unstructured text. With the ever growing amount of unstructured text data, such as emails, web entities, and machine data logs, performing the task of DM in a computationally-efficient way is becoming increasingly important. However, it is often the case that DM operators make up a significant time-consuming portion of text analytics, because these operators scan the text corpora, whereas subsequent steps work on the results and parts of documents only. Accordingly, reducing the computing and memory requirements in the task of DM can be highly valuable.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures.