1. Field of the Invention
The present invention relates generally to regular expressions for natural language processing, and more specifically to optimizing generation of a regular expression, utilized for entity extraction, that can identify a word or a phrase having the word within text data (i.e., one or more strings of text) even if the word is misspelled.
2. Description of the Related Art
The tremendous growth of the Internet and computer storage capabilities has enabled people to have access to massive amounts of electronically stored data, wherein the data includes text data (i.e., one or more strings of text) stored on a computer readable tangible storage device. There are various computer software programs utilized as search tools capable of searching and identifying information within the text data. Specifically, it is known to utilize search tools having a regular expression to identify one or more specific words within the text data, in order to perform entity extraction. However, if the text data is not validated prior to being stored on the computer readable tangible storage device, then there can be a misspelling of a word within the text data. The misspelling of a word within the text data can result in a search tool not being able to identify the word that is misspelled even if the word is only slightly misspelled, which can further result in inaccurate and imprecise information entity extraction results.