US 12,170,133 B2
Automated information extraction and enrichment in pathology report using natural language processing
Vishakha Sharma, Pleasanton, CA (US); Yogesh Pandit, Pleasanton, CA (US); and Ram Balasubramanian, Pleasanton, CA (US)
Assigned to Roche Molecular Systems, Inc., Pleasanton, CA (US)
Appl. No. 17/639,441
Filed by Roche Molecular Systems, Inc., Pleasanton, CA (US)
PCT Filed Sep. 8, 2020, PCT No. PCT/US2020/049738
§ 371(c)(1), (2) Date Mar. 1, 2022,
PCT Pub. No. WO2021/046536, PCT Pub. Date Mar. 11, 2021.
Claims priority of provisional application 62/897,252, filed on Sep. 6, 2019.
Prior Publication US 2022/0301670 A1, Sep. 22, 2022
Int. Cl. G16H 15/00 (2018.01); G06F 40/20 (2020.01); G06F 40/279 (2020.01); G06V 30/30 (2022.01); G06V 30/416 (2022.01); G16H 10/60 (2018.01); G16H 30/20 (2018.01)
CPC G16H 15/00 (2018.01) [G06F 40/20 (2020.01); G06F 40/279 (2020.01); G06V 30/30 (2022.01); G06V 30/416 (2022.01); G16H 10/60 (2018.01); G16H 30/20 (2018.01)] 20 Claims
OG exemplary drawing
 
15. A system comprising:
one or more processors; and
a non-transitory computer-readable medium storing a plurality of instructions executable by the one or more processors to perform a method comprising:
receiving an image file containing a pathology report;
performing an image recognition operation on the image file to extract input text strings;
detecting, using a natural language processing (NLP) model, entities from the input text strings, each entity including a label and a value;
extracting, using the NLP model, values of the entities from the input text strings;
converting, based on a mapping table that maps entities and values to pre-determined terminologies, the values of at least some of the entities to corresponding pre-determined terminologies; and
generating a post-processed pathology report including the entities detected from the input text strings and the corresponding pre-determined terminologies,
wherein the input text strings are first input text strings; and
wherein parameters of the image recognition operation are determined based on an accuracy of recognizing entities from second input text strings by the NLP model, the second input text strings being generated by the image recognition operation using the parameters.