CPC G06F 40/295 (2020.01) [G06F 40/106 (2020.01); G06F 40/30 (2020.01); G06N 3/08 (2013.01); G06T 11/60 (2013.01)] | 30 Claims |
1. A system comprising:
one or more hardware processors of a machine; and
at least one memory storing instructions that, when executed by the one or more hardware processors, cause the machine to perform operations comprising:
performing a plurality of iterations to generate a Natural Language Processing (NLP) model, each iteration comprising:
receiving a plurality of real-world documents, the plurality of real-world documents including text data, layout data, and image data;
processing, by at least one or more hardware processors, the plurality of real-world documents to generate an initial prediction for data points within the plurality of real-world documents using a neural network;
validating the initial prediction by comparing extracted values corresponding with information present in the plurality of real-world documents and correcting discrepancies found based on the comparing;
evaluating a quality of the validated initial prediction; and
determining that the quality of the validated initial prediction satisfies a quality constraint; and
configuring the NLP model to process a new document to extract data points without validation.
|