Field
The present disclosure relates generally to the fields of natural language processing and information extraction. More particularly, the present disclosure provides systems and methods for converting complex natural language sentences into simple structures, so that information can be extracted efficiently using regular expression-like patterns.
Technical Background
In our age of information explosion, thousands and thousands of journal articles, scientific papers, and other kinds of written documents are generated every day. A particular person may be interested in certain information contained in at least some of these documents, but lack the capacity to read the documents and extract the information of interest.
Computing systems may be used to input the information from these written documents, and then parse the documents to catalog their contents so that a user can search for information of interest. However, the written documents are provided in natural language, which is difficult for a computing device to process without first converting the natural language into an acceptable computer-readable format. Such a conversion may be referred to as Natural Language Processing (NLP). Existing NLP approaches may rely on grammar to construct semantic trees. Such an approach makes it necessary to create and maintain complete grammar of a language, which is a cumbersome task. As such, existing NLP approaches are cumbersome, time consuming, and require significant processing power. In addition, existing NLP approaches require large databases to store the semantic trees that maintain the complete grammar of the language.
Accordingly, a need exists for methods and systems that can process natural language sentences and extract information therefrom accurately and efficiently without requiring cumbersome, time consuming, and processing-intensive tasks that require an excessive amount of storage space.