The present disclosure is related to the field of automated transcription. More specifically, the present disclosure is related to diarization using acoustic labeling.
Speech transcription and speech analytics of audio data may be enhanced by a process of diarization wherein audio data that contains multiple speakers is separated into segments of audio data typically to a single speaker. While speaker separation in diarization facilitates later transcription and/or speech analytics, further identification or discrimination between the identified speakers can further facilitate these processes by enabling the association of further context and information in later transcription and speech analytics processes specific to an identified speaker.
Systems and methods as disclosed herein present solutions to improve diarization using acoustic models to identify and label at least one speaker separated from the audio data. Previous attempts to create individualized acoustic voiceprint models are time intensive in that an identified speaker must recorded training speech into the system or the underlying data must be manually separated to ensure that only speech from the identified speak is used. Recorded training speech further has limitation as the speakers are likely to speak differently than when the speaker is in the middle of a live interaction with another person.