1. Field
Embodiments provided herein generally relate to increasing search functionality and efficiency for search and other natural language processing (NLP) tasks by generating a lexicon, and particularly to generating varied forms of terms and linking them to corresponding normalized forms to increase accuracy of user-performed functions.
2. Technical Background
As electronic systems convert documents and other data into electronic form, many of documents that have been converted are indexed to facilitate search, retrieval, and/or other functions. For example, legal documents, such as court decisions, briefs, motions, etc. may be stored and indexed for users to access electronically. As different legal documents may include different legal points pertaining to different jurisdictions, those documents may be indexed and organized accordingly. However, problems can arise when legal points do not have a standardized lexicon shared across all documents. In such situations, if a user inputs even a slightly different term or phrase than was used in a particular legal document, that legal document may not be retrieved in the ensuing search.
For example, a user may perform an electronic search for the term “lack of any evidence.” While a present electronic searching system may be configured to retrieve documents that include this precise term, many electronic searching systems may be unable to retrieve documents (or other data) that include schematically equivalent variants of this term, such as “lacking evidence,” “lack any evidence,” “lacks evidence,” etc. Accordingly, due to this failure to identify and match semantic equivalents, such electronic search systems may fail to retrieve many of the documents relevant to the original query terms, thus rendering the electronic search systems less effective for their intended purpose.