US 12,169,500 B1
Systems, methods, and apparatuses for generating, extracting, classifying, and formatting object metadata using natural language processing in an electronic network
Yile Sun, Sudbury, MA (US); and Mohammad Sarker, Jamaica, NY (US)
Assigned to BANK OF AMERICA CORPORATION, Charlotte, NC (US)
Filed by BANK OF AMERICA CORPORATION, Charlotte, NC (US)
Filed on Aug. 1, 2023, as Appl. No. 18/228,868.
Int. Cl. G06F 16/25 (2019.01); G06F 40/205 (2020.01)
CPC G06F 16/254 (2019.01) [G06F 40/205 (2020.01)] 17 Claims
OG exemplary drawing
 
1. A system for generating, extracting, classifying, and formatting object metadata using natural language processing in an electronic network, the system comprising:
a memory device with computer-readable program code stored thereon;
at least one processing device, wherein executing the computer-readable code is configured to cause the at least one processing device to perform the following operations:
identify at least one input source, wherein the at least one input source comprises at least one input data;
parse the at least one input data;
output, by an extraction layer, at least one object metadata and a term importance score associated with the input data to a metadata storage, wherein the extraction layer comprises a natural language processing attribute extraction model, wherein the extraction layer receives at least one preprocessed dataset unit and extracts the at least one object metadata and the term importance score by:
performing text identification of the at least one preprocessed dataset unit by identifying at least one sentence and selecting at least one candidate metadata term of the at least one preprocessed dataset unit, wherein the at least one candidate metadata term consists of at least one word;
representing the at least one candidate metadata term with at least one feature;
scoring the at least one candidate metadata term based on the at least one feature;
assigning the term importance score to the at least one candidate metadata term;
creating a list of the at least one candidate metadata term in order of term importance score rankings; and
outputting the list of the at least one candidate metadata term as object metadata with its corresponding term importance score to the metadata storage;
assemble, by an assignment layer, a corpus of text data and key phrases based on the object metadata from the extraction layer, wherein the assignment layer comprises an natural language processing classification model;
classify and verify, by the assignment layer, the object metadata; and
output, by a generative layer, at least one generative metadata to the metadata storage, wherein the generative layer comprises a pretrained generative natural language processing model that is tuned by the object metadata and the term importance score from the extraction layer and the corpus of text data and key phrases from the assignment layer.