The present invention relates to speech recognition systems and, more particularly, to methods and systems for testing speech recognition systems.
Speech recognition is an important aspect of furthering man-machine interaction. The end goal in developing speech recognition systems is to replace the keyboard interface to computers with voice input. This may make computers more user friendly and enable them to provide broader services to users. To this end, several systems have been developed. The effort for the development of these systems aims at improving the transcription error rate on real speech in real-life applications. In the course of developing these systems, one needs to compare different approaches by running tests over standardized test data which are generally recorded speech of a reference script.
The reason for this is that for fair comparisons and reproducible results, it is essential that all experiments be carried out with exactly the same speech input. Therefore, all systems will be tested by the same speakers reading the same script (text or voice commands). Since it is impossible for a speaker to utter the words twice in exactly the same way, and since the background noise would also change from utterance to utterance, the test speech data is recorded once and for all, and then reused for all the tests.
In particular when the objective is to test the resilience of the system to dictation of very varied texts, to obtain any kind of statistically significant results, it becomes necessary to record very large bodies of text corpora spoken by the test speaker(s).
Recording of this large amount of text is commonly realized by a human speaker (or a set of human speakers) who reads reference texts to a microphone in a controlled fashion. The main drawback of human dictation is that the collecting thereof is costly in that it is very labor-intensive to record a massive amount of test material in a controlled fashion.
As a consequence of the foregoing difficulties in the prior art, it is an object of the present invention to provide speech recognition systems and methods wherein the test speech material is provided independently of a human speaker.
The present invention solves the foregoing need by providing systems and associated methods for testing speech recognition systems in which the speech recognition device to be tested is directly monitored in accordance with a text-to-speech device. The collection of reference texts to be used by the speech recognition device is provided by a text-to-speech device preferably implemented within the same computer system.
In one embodiment of the invention, the method comprises the steps of:
a) generating a digital audio file from a reference text using a text-to-speech device, the digital audio file being stored within a storage area of a computer system; and
b) reading the digital audio file using a speech recognition device to generate a decoded text representative of the reference text.
It is known that the phrase xe2x80x9cdecoded textxe2x80x9d may be used interchangeably with the phrase xe2x80x9crecognized text.xe2x80x9d
In a further step, alignment of the reference text and the decoded text is accomplished and an error report representative of the recognition rate of the speech recognition device is generated.
Preferably, the step of generating a digital audio file is realized by:
a1) tokenizing an initial text stored on a storage area of the computer system to generate a tokenized text;
a2) marking-up of the tokenized text to generate a marked text; and
a3) synthesizing the marked text to generate the digital audio file.
In an alternate embodiment, the text-to-speech device is implemented within a first computer system while the speech recognition device is implemented within a second computer system. The method then comprises the steps of:
a) generating a synthetic speech from a reference text using the text-to-speech device; and
b) processing the synthetic speech to generate a decoded text representative of the reference text using the speech recognition device.