The present invention relates to techniques for generating and using hardware-accelerated cascaded finite state transducers that may ingest a document corpus and analyze its content.
The process of extracting information from large-scale unstructured text is called text analytics and has applications in business analytics, healthcare, and security intelligence. For example, in the healthcare domain, domain-specific document processors may be used to identify, normalize, and code medical and social facts in unstructured content, such as in patient records and in medical journals. Analyzing unstructured text and extracting insights hidden in it at high bandwidth and low latency are computationally challenging tasks. In particular, text analytics functions typically rely heavily on finite-state-machine processing-based tasks. Typically, much of the execution time of text analytics runtime systems is spent on shallow parser stages of document processors, which may be built software-based finite state transducer libraries.
Accordingly, a need arises for techniques by which the execution time of finite state transducer libraries may be reduced, to provide improved performance and reduced cost.