Real-time captioned news is a lifeline service for people who are deaf or hard of hearing, providing critical information about their local communities, national events and emergencies. Captioning mandates designed to provide equal access to television have resulted in rapid growth of the so-called caption industry. A shortage of skilled real-time steno-captionists, and the downward pressure on rates by program providers, has made the low quality of live captioning of news broadcasts a growing issue.
Disability organizations have filed complaints and a formal petition with the Federal Communications Commission (FCC), which reflects frustration with chronic problems related to live captioning quality, transmission errors, and lack of industry response to their concerns. Without a way of accurately measuring quality of captions, the FCC, consumers, and broadcasters have no efficient method of tracking and improving so-called steno-caption accuracy performance.
One conventional way of measuring the accuracy of closed-caption text derived from a respective audio signal is to use a conventional word error rate. As its name suggests, conventional word error rate algorithms first identify a total number of spoken words in the audio signal that are not properly translated into corresponding text. Additionally, conventional word error rate algorithms divide the number of detected word errors by the total number of words.
The word error rate can be presented as a percentage value, which indicates a degree to which words in the audio signal are properly converted into text. For example, a relatively high word error percentage rate therefore indicates low accuracy of converting the audible signal into respective text. Conversely, a low word error percentage rate indicates high accuracy of converting the audible signal into respective text. Of course, higher word accuracy is desirable to ensure that the closed-caption text produced for respective video is intelligible.