Entity extraction (also known as “named entity recognition” and “entity identification”) is a form of information extraction that may be performed on large sets of documents. Entity extraction may be performed to locate and classify entity strings (strings of words) in the text of the documents into predefined categories such as the names of persons, places, times, things, quantities, monetary values, percentages, organizations, etc. Extraction of entity strings from documents is important for enabling data analysis over unstructured data.
Commercially available entity extractors exist for a variety of entity types such as people names, product names and locations. Current entity extraction techniques are primarily based on machine learning (ML) and natural language processing (NLP) techniques. Such techniques process each document of the document set, and thus can be very expensive, particularly when thousands of documents or more are being processed. Thus, more efficient ways of performing entity extraction are desired.