The present invention relates to the field of speech processing technologies and, more particularly, to stored phrase reutilization when testing speech recognition grammars.
Voice user interfaces (VUI) and multimodal interfaces accept spoken phrases as input. These spoken phrases are speech recognized using an associated speech recognition grammar. In many implementations, different application states of the VUI are associated with different permissible spoken phrases. These phrases are recognized by a context dependent speech recognition grammar, where the context is based upon the VUI state. As changes are made to a VUI interface, the set of permitted spoken phrases and corresponding context dependent speech recognition grammars can change. Thus, from version to version of a VUI, related speech recognition grammars can change.
Testing VUIs can be a challenge, especially with regards to testing speech recognition accuracy and precision, which can involve an accuracy and precision of underlying speech recognition grammars. Typically, each speech recognition grammar is tested using a large number of pre-recorded phrases. The pre-recorded phrases are typically stored in a database as an audio file, which is associated with a text representation of each stored phrase. When a speech recognition engine using the speech recognition grammar is able to generate a text result from the audio file, which matches the stored text representation, a successful test has occurred. Ideally, the set of test phrases used to test a speech recognition grammar should cover a statistically significant portion, if not all, of the allowed phrases.
Current VUI testing techniques focus upon maximizing phrase coverage and minimizing complexities of testing. Many VUI testing techniques select a set of phrases for a given version of a VUI and store a version specific test set of phrases consisting of audio files and textual representations for each of the selected phrases. When a common phrase is used across more than one VUI version, multiple copies of audio files for that common phrase are stored, one copy per each version specific test set. Additionally, each test set for a VUI version can be produced through a VUI version specific recording session. These practices result in significant storage and recording costs.
Often a minimization of recording costs is attempted by relying upon one or more external sources of audio recordings. A large manual effort is involved in selecting which phrases from the external sources are to be used to test each specific speech recognition grammar. The cost, time, and confusion resulting from manual efforts of selecting phrases for grammar testing is one reason many opt for the previous solution of version specific recordings, which result in version specific test sets.