The present invention relates to creating a data store compatible with natural language processing, and more specifically, to converting portions of text from a plurality of different data sources into objects with a shared format.
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human languages. To interact with humans, natural-language computing systems may use a data store that is parsed and annotated. For example, the computing system may use the data store to identify an answer to a question posed by a human user by correlating the question to the annotations in the data store.
Before the NLP computing system is able to interact with a user, the data store is populated with different text documents. In addition, annotators may parse the text in the data store to generate metadata about the text. Using the metadata and the stored text, the NLP computing system can interact with the user to, for example, answer a posed question, diagnose an illness based on provided symptoms, evaluate financial investments, and the like. In a sense, the data store acts like the “brain” of the natural-language computing system.