US 12,169,691 B2
Filler word detection through tokenizing and labeling of transcripts
Alexandre de Brébisson, Montréal (CA); and Antoine d'Andigné, Paris (FR)
Assigned to Descript, Inc., San Francisco, CA (US)
Filed by Descript, Inc., San Francisco, CA (US)
Filed on Apr. 4, 2023, as Appl. No. 18/295,684.
Application 18/295,684 is a continuation of application No. 17/094,533, filed on Nov. 10, 2020, granted, now 11,651,157.
Claims priority of provisional application 63/058,363, filed on Jul. 29, 2020.
Prior Publication US 2023/0244870 A1, Aug. 3, 2023
Int. Cl. G06F 40/284 (2020.01); G06F 40/166 (2020.01); G06F 40/205 (2020.01); G06F 40/221 (2020.01); G06F 40/253 (2020.01); G06F 40/263 (2020.01); G10L 15/26 (2006.01)
CPC G06F 40/284 (2020.01) [G06F 40/166 (2020.01); G06F 40/205 (2020.01); G06F 40/221 (2020.01); G06F 40/253 (2020.01); G06F 40/263 (2020.01); G10L 15/26 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A method performed by a computer program executing on a computing device, the method comprising:
obtaining multiple audio samples to be used to train a machine learning model to identify instances of a filler word,
wherein each of the multiple audio samples includes a spoken instance of the filler word;
providing the multiple audio samples to a machine learning algorithm as input, such that the machine learning algorithm
(i) derives, based on an analysis of the multiple audio samples, a rule for identifying instances of the filler word based on context, wherein the rule specifies that (a) a form of punctuation must precede or succeed the filler word, (b) another instance of the filler word cannot precede or succeed the filler word, or (c) a word preceding the filler word cannot be a certain part of speech, and
(ii) produces, as output, the machine learning model that is trained to learn the rule;
obtaining a transcript that includes a series of words arranged in sequential order as uttered in an audio file;
tokenizing each word in the transcript as a separate token so as to create a series of tokens arranged in sequential order,
wherein each token in the series of tokens is representative of a tuple that includes a corresponding word and a label indicating part of speech of the corresponding word;
applying multiple machine learning models, including the machine learning model, to the series of tokens,
wherein each of the multiple machine learning models is associated with a different one of multiple filler words, each of which is representative of one or more words, and
wherein each of the multiple machine learning models takes, as input, (i) the series of tokens and (ii) positional indexes of the series of tokens; and
in response to a determination that at least two of the multiple machine learning models produce outputs indicating that a given token corresponding to a given word is representative of the corresponding filler word,
determining which of the at least two machine learning models is trained to identify a longest string of words, and
causing only the filler word identified by the determined machine learning model to be returned as a result.