US 12,170,079 B2
System and method for improving named entity recognition
Divya Neelagiri, Dublin, CA (US); Taeyeon Ki, Milpitas, CA (US); and Vijendra Raj Apsingekar, San Jose, CA (US)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Aug. 3, 2021, as Appl. No. 17/444,367.
Prior Publication US 2023/0040181 A1, Feb. 9, 2023
Int. Cl. G10L 15/06 (2013.01); G10L 15/18 (2013.01); G10L 15/26 (2006.01)
CPC G10L 15/063 (2013.01) [G10L 15/18 (2013.01); G10L 15/26 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
training, using at least one processor of an electronic device, a set of teacher models, wherein training the set of teacher models comprises:
for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset comprising multiple labels;
wherein at least some of the unlabeled audio samples contain named entity (NE) audio data; and
wherein at least some of the labels comprise transcribed NE labels corresponding to the NE audio data;
correcting, using the at least one processor, at least some of the transcribed NE labels using NEs that are selected from user-specific NE textual data, wherein the selected NEs are phonemically similar to NEs of the at least some of the transcribed NE labels; and
retraining, using the at least one processor, the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, wherein the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models;
wherein retraining the set of teacher models comprises penalizing at least one teacher model using a variable loss when the pseudo labeled dataset of the at least one teacher model includes at least one of the transcribed NE labels that is corrected.