Speech recognition, sometimes referred to as automatic speech recognition, computer speech recognition, speech to text, and others, converts spoken words and word sequences into machine-readable data. Speech recognition can take a number of forms. One form relates to free speech recognition, in which it is required to transcribe spoken text from audio stream or file, by one or more speakers, whether any of the speakers is known or not. Free speech recognition is used in applications such as dictation, preparation of structured documents, such as radiology reports, and others. Another form relates to word spotting, in which predetermined words are searched for in audio sources such as files or streams, for applications such as voice dialing, voice-activation of devices, or the like.
However, speech recognition systems provide neither a hundred percent recall, i.e., not all words that were actually spoken are found, nor hundred percent precision, i.e., not all words allegedly found in the audio were indeed spoken.
The quality of the text has significant impact on its usability. In dictation applications, the higher the quality, the less manual work is required. In automatic applications wherein manual supervision is not available, the quality of the text influences the analysis and conclusions that can be deduced from the text.
Some speech recognition engines provide a certainty score for each found word, i.e. an indicator to the confidence degree assigned by the engine to the spotted or transcribed word. Yet, even the certainty score does not provide accurate indication to the quality of the results, so simply ignoring results having relatively low certainty score may indeed remove erroneous words, but may also remove correct words thus reducing the recall percentage.
Thus there is a need in the art for a method and apparatus for detecting erroneous words or phrases, so such words can be ignored. By ignoring erroneous words, the text quality increases, as well as the quality of text mining deductions.