Speech recognition has simplified many tasks in the workplace by permitting hands-free communication with a computer as a convenient alternative to communication via conventional peripheral input/output devices. A user may enter data and commands by voice using a device having processing circuitry with speech recognition features. Commands, instructions, or other information may also be communicated to the user by a speech synthesizes circuitry of the processing circuitry. Generally, the synthesized speech is provided by a text-to-speech (TTS) engine in the processing circuitry. Speech recognition finds particular application in mobile computing environments in which interaction with the computer by conventional peripheral input/output devices is restrictive or otherwise inconvenient.
For example, wearable or otherwise portable computer devices can provide a user that performs a variety of work-related tasks with desirable computing and data-processing functions, while offering the user enhanced mobility within the workplace. One particular area, for example, in which users rely heavily on such speech-based devices is inventory management. Inventory-driven industries rely on computerized inventory management systems for performing various diverse tasks, such as food and retail product distribution, manufacturing, and quality control. An overall integrated management system typically includes a combination of a central computer system for tracking and management, and the people who use and interface with the computer system in the form of order fillers and other users. In one scenario, the users handle the manual aspects of the integrated management system under the command and control of information transmitted from the central computer system to the wireless mobile device and to the user through a speech-driven interface.
As the users process their orders and complete their assigned tasks, a bi-directional dialog or communication stream of information is provided over a wireless network between the users wearing wireless devices and the central computer system that is directing multiple users and verifying completion of their tasks. To direct the user's actions, information received by each mobile device from the central computer system is translated into speech or voice instructions for the corresponding user. Typically, to receive the voice instructions, the user wears a headset coupled with the mobile device.
The headset includes one or more microphones for spoken data entry, and one or more speakers for playing audio. Speech from the user is captured by the headset and is converted using speech recognition functionalities into data used by the central computer system. Similarly, instructions from the central computer or mobile device are delivered to the user as speech via the TTS engine's generation of speech and audio and the headset speaker. Using such mobile devices, users may perform assigned tasks virtually hands-free so that the tasks are performed more accurately and efficiently.
However, a system's ability to accurately recognize and process the user's speech is dependent on the quality of the speech audio that is captured from the user. This will depend upon the user's ability or desire to properly use the equipment so that such use facilitates the capture of quality speech audio. The audio quality, in turn, is partially dependent on a variety of parameters, some of which are controllable by a user and others that are not. For example, captured speech quality may depend on the quality of the microphones, the orientation of the microphone with respect to the mouth of the user, the background noise that is captured with the user's speech, and other factors. While the headset manufacturer can address some issues and parameters, such as microphone quality, they cannot control other parameters, such as the user's operation of the device.
If the microphone is not positioned properly with respect to the user's mouth, for example, the ratio of user speech versus background noise (signal to noise ratio SNR) decreases. As a result, the voice recognition system may not receive a quality speech input, and may misinterpret the user's spoken audio. This degrades the speech recognition process and increases processing error rates. It also requires repetition of previously spoken dialog, instructions, or commands. Some users particularly have problems because they do not want the microphone in front of their face, and choose to orient the microphone in a position that does not facilitate accurate capture of the user's voice. For example, moving the speech microphone so that it is adjacent to the user's forehead or below their chin or otherwise out of the way, often produces unacceptable voice quality and a poor signal to noise ratio (SNR).
Therefore, there is a need to ensure suitable speech quality and subsequent speech recognition.