Automatic speech recognition (ASR) has seen much growth in recent years. However, fully automated transcriptions of audio produced by ASR systems are still quite often not accurate enough. For this reason, hybrid transcription is often used to generate transcriptions when it is important to deliver accurate results. In hybrid transcription, one or more layers of human transcribers review transcriptions generated by an ASR system and correct errors that are found in the transcriptions.
When it comes to determining how much human reviewing is needed, such as determining how many layers of review to use, there is a cost/benefit tradeoff that needs to be considered. Each layer of review takes time and adds to the cost of the transcription process. Thus, adding additional layers of review may not be justified if the transcription is likely to be sufficiently accurate. Therefore, in order to be able to make better decisions about allocating resources to complete a transcription job, there needs to be a way to assess the quality of the work performed by the transcribers.