Speech recognition has simplified many tasks in the workplace by permitting hands-free communication with a computer as a convenient alternative to communication via conventional peripheral input/output devices. A user may enter data and commands by voice using a device having a speech recognizer. Commands, instructions, or other information may also be communicated to the user by a speech synthesizer. Generally, the synthesized speech is provided by a text-to-speech (TTS) engine. Speech recognition finds particular application in mobile computing environments in which interaction with the computer by conventional peripheral input/output devices is restricted or otherwise inconvenient.
For example, wireless wearable, portable, or otherwise mobile computer devices can provide a user performing work-related tasks with desirable computing and data-processing functions while offering the user enhanced mobility within the workplace. One example of an area in which users rely heavily on such speech-based devices is inventory management. Inventory-driven industries rely on computerized inventory management systems for performing various diverse tasks, such as food and retail product distribution, manufacturing, and quality control. An overall integrated management system typically includes a combination of a central computer system for tracking and management, and the people who use and interface with the computer system in the form of order fillers and other users. In one scenario, the users handle the manual aspects of the integrated management system under the command and control of information transmitted from the central computer system to the wireless mobile device and to the user through a speech-driven interface.
As the users process their orders and complete their assigned tasks, a bi-directional communication stream of information is exchanged over a wireless network between users wearing wireless devices and the central computer system. The central computer system thereby directs multiple users and verifies completion of their tasks. To direct the user's actions, information received by each mobile device from the central computer system is translated into speech or voice instructions for the corresponding user. Typically, to receive the voice instructions, the user wears a headset coupled with the mobile device.
The headset includes a microphone for spoken data entry and an ear speaker for audio data feedback. Speech from the user is captured by the headset and converted using speech recognition into data used by the central computer system. Similarly, instructions from the central computer or mobile device in the form of text are delivered to the user as voice prompts generated by the TTS engine and played through the headset speaker. Using such mobile devices, users may perform assigned tasks virtually hands-free so that the tasks are performed more accurately and efficiently.
An illustrative example of a set of user tasks in a speech-directed work environment may involve filling an order, such as filling a load for a particular truck scheduled to depart from a warehouse. The user may be directed to different warehouse areas (e.g., a freezer) in which they will be working to fill the order. The system vocally directs the user to particular aisles, bins, or slots in the work area to pick particular quantities of various items using the TTS engine of the mobile device. The user may then vocally confirm each location and the number of picked items, which may cause the user to receive the next task or order to be picked.
The speech synthesizer or TTS engine operating in the system or on the device translates the system messages into speech, and typically provides the user with adjustable operational parameters or settings such as audio volume, speed, and pitch. Generally, the TTS engine operational settings are set when the user or worker logs into the system, such as at the beginning of a shift. The user may walk though a number of different menus or selections to control how the TTS engine will operate during their shift. In addition to speed, pitch, and volume, the user will also generally select the TTS engine for their native tongue, such as English or Spanish, for example.
As users become more experienced with the operation of the inventory management system, they will typically increase the speech rate and/or pitch of the TTS engine. The increased speech parameters, such as increased speed, allows the user to hear and perform tasks more quickly as they gain familiarity with the prompts spoken by the application. However, there are often situations that may be encountered by the worker that hinder the intelligibility of speech from the TTS engine at the user's selected settings.
For example, the user may receive an unfamiliar prompt or enter into an area of a voice or task application that they are not familiar with. Alternatively, the user may enter a work area with a high ambient noise level or other audible distractions. All these factors degrade the user's ability to understand the TTS engine generated speech. This degradation may result in the user being unable to understand the prompt, with a corresponding increase in work errors, in user frustration, and in the amount of time necessary to complete the task.
With existing systems, it is time consuming and frustrating to be constantly navigating through the necessary menus to change the TTS engine settings in order to address such factors and changes in the work environment. Moreover, since many such factors affecting speech intelligibility are temporary, is becomes particularly time consuming and frustrating to be constantly returning to and navigating through the necessary menus to change the TTS engine back to its previous settings once the temporary environmental condition has passed.
Accordingly, there is a need for systems and methods that improve user cognition of synthesized speech in speech-directed environments by adapting to the user environment. These issues and other needs in the prior art are met by the invention as described and claimed below.