Subtitling of Hypertext Transfer Protocol (HTTP) live streaming (HLS) generates written text or transcription of spoken words that can be displayed concurrently with the streaming video with audio. Subtitling can use automatic speech recognition (ASR) algorithms, which generate the written text that include word error rates and word recognition rates.
The word error rates indicate a frequency with which the written text incorrectly/correctly represents the spoken words of one or more speakers audible in the streaming video with audio. The word recognition rates indicate a frequency that written words can be identified for the spoken words.
A conventional approach to minimizing transcription word error rates while improving word recognition rates is a biometric identification of each speaker and a transcription of spoken words according to ASR of each biometrically identified speaker.