1. Field
This disclosure relates to speech recognition technology, more particularly for methods to utilize multiple speech technologies.
2. Background
Speech recognition technology has become more prevalent in recent times. Beyond dictation machines and speech-to-text conversion, speech can be used to navigate and give commands through several different types of systems. It is particularly attractive to highly mobile users, who may want to make travel reservations, leave messages, access e-mail and perform other tasks using any available phone. Being able to navigate these types of systems using voice commands, as well as being able to dictate text messages for electronic mail systems is very attractive.
Throughout this document, the terms xe2x80x98speech recognition technologyxe2x80x99, or xe2x80x98recognizerxe2x80x99 will be used to describe the software that carries out the conversion of digital audio to text. The terms xe2x80x98speech recognition systemsxe2x80x99 and xe2x80x98systemxe2x80x99 will refer to systems that incorporate one or more speech recognition technology or recognizer components. In the discussions that follow, differentiation will be made between speech recognition technologies based on capabilities and performance.
Capabilities, as used here, will refer to the type of speech that the technology is capable of recognizing. This might be a very small domain, such as digits only, or a very large one, such as a large vocabulary dictation system. Performance may be measured along a number of orthogonal axes including the accuracy of conversion from speech to text, the resource requirement of the conversion process, the latency of conversion and other factors. Note that in addition to items such as computation, memory, etc., resource requirements may include items such as licenses for the recognizer used.
Current speech recognition systems typically use one type of speech technology. The system designers must select a technology based on their required system capabilities and performance and target technology capabilities, costs and performance. Inexpensive, lower capability technologies provide high accuracy in only a limited range of capabilities, but do not require large resource commitments. Midrange technologies have increased capabilities with a commensurate increased resource requirement. For speech recognition tasks that only need the lower capabilities, the system can bog down unnecessarily if higher capability recognition technologies are used, providing unneeded features. Higher requirement tasks, such as dictation, cannot obtain the desired accuracy with the lower capability technologies. High capability, high performance technologies would have the highest accuracy and widest range of tasks it can complete but may be too expensive to implement system-wide, or may require too high a level of resources for some enterprises.
Therefore, some method of using several different kinds of speech recognition technology in one system would seem helpful, as would ways to manage the utilization of these different technologies.