Speech recognition has simplified many tasks in the workplace by permitting hands-free communication with a computer as a convenient alternative to communication via conventional peripheral input/output devices. A worker may enter data by voice using a speech recognizer and commands or instructions may be communicated to the worker by a speech synthesizer having a text-to-speech (TTS) functionality. Speech recognition finds particular application in mobile computing devices or mobile computers in which interaction with the computer by conventional peripheral input/output devices is restricted or inconvenient.
One particular work area where mobile computing devices and voice-directed work using such mobile devices have become well-established is the field of inventory management. However, other areas of work have benefited from such technology as well, such as healthcare services. Voice-assisted/directed work systems rely on computerized management systems for performing various diverse tasks, such as product distribution, manufacturing, quality control, and patient care. An overall integrated system involves a combination of a central computer system for tracking and management, and the people or workers who use and interface with the central computer system in the form of order fillers, pickers, care providers, and other workers. The workers handle the manual aspects of the integrated system under the command and control of information transmitted from the central computer system to the mobile computer devices carried by the workers.
As the workers complete their assigned tasks, they are provided with instructions and information via speech prompts, and then answer the prompts or otherwise provide data using speech. The central computer system collects a variety of types of information based on the specific assigned task and data or input from the worker, such as through speech or some other data capture. For example, when a worker is filling orders by picking inventory from the storage racks, the central computer system will request information on product identification and quantity so that the central computer system can properly notify inventory managers when supplies are low on a given item of inventory. In another example, when a worker is investigating damaged inventory for quality control purposes, the central computer system will request information on product identification, purchase order identification, and damage descriptions for the items affected. The use of speech as a type of input mechanism finds advantageous application in these and other situations because workers may be more efficient if both hands are free for doing work.
While speech is useful as an input and output mechanism for assisting a user to complete their work tasks, other input/output modes might also be utilized in directing a worker to perform a particular task in capturing data associated with the performance of that task. Such multiple mode or multimodal applications have been developed to coordinate the various input components or devices and output components or devices associated with a multimodal system. These so-called multimodal systems coordinate a plurality of input and output components provided with the mobile computer device, including microphones, speakers, radio-frequency identification (RFID) readers or scanners, barcode scanners, display screens, touch screens, printers, and keypads, for example. One example of such a multimodal software application is described in U.S. Patent Publication No. 2005/0010892, co-owned by the assignee of the present application, the disclosure of which is hereby incorporated by reference in its entirety. These multimodal applications and systems enable the smooth entry of data in various different modes or forms, such as keyboard entry, barcode, or RFID scanning voice and others. The applications coordinate the inputs and outputs in the various modes of the multimodal system. However, as with any such system to assist a worker in the performance of their job, their ability to efficiently use such systems in paramount.
One particular advantage in a voice-directed/assisted system is the ability of a user to speak ahead or talk ahead of the speech prompts that they may receive from the system. For example, in the collection of data associated with a particular task, a speech-based system might provide spoken prompts to a user, such as to ask a question. The user, in reply, would then speak a particular utterance associated with that prompt, such an answer to the questions of the prompts or otherwise address the prompt. In that way, data is gathered. In some speech systems, such as the Vocollect Voice™ Product used with the Talkman available from Vocollect, Inc. of Pittsburgh, Pa., a user might be allowed to speak multiple utterances before or ahead of the voice prompts, without waiting for the audible prompting. This “speak-ahead” feature generally requires that the user have knowledge of the upcoming prompts in order to do so. When such a speak-ahead feature is utilized, the multiple utterances are captured as responses to specific upcoming prompts, and efficiency is enhanced because the prompts are answered in order and the system moves ahead without having provided those prompts.
Such a speak-ahead optimization is particularly useful in voice-directed work where the workflows are relatively consistent, and do not vary significantly, and a user can thus gain knowledge regarding the upcoming prompts. However, when voice is utilized more to assist the work of a user, rather than specifically direct that work, such a speak-ahead feature is less than advantageous. In such an area, the workflows would generally be less predictable and less repeated, and thus, it is more difficult for a worker to obtain the knowledge or memory of the upcoming prompts. Furthermore, in speech-assisted work environments, the workflows are generally part-time in nature. As such, they are generally done on a part-time basis by the workers in the system, and those workers are not able to memorize the prompt order as readily. Therefore, it is desirable to address such drawbacks in a speech-assisted system, and to allow the speak-ahead optimization or features thereof to be implemented in such a system to improve the efficiency of a worker. Furthermore, even in those systems where the workflow is more consistent and voice-directed, there could still be a problem with worker turnover. Such turnover leads to situations where certain worker are inexperienced with the system, and do not anticipate what data needs to be entered or what voice utterances need to be uttered to answer or address the prompts for any given task. Therefore, there is further need to improve the ability of inexperienced workers to implement a speak-ahead feature within a work system utilizing speech, such as a multimodal system wherein speech is one of the input and output modes.
While multimodal systems, such as multimodal systems utilizing speech, can provide great flexibility with respect to assisting or directing a user in their work tasks, such flexibility can also provide confusion and inefficiency in the work environment due to user inexperience in using the system. For example, when multiple input mechanisms in a multimodal system are available to a user for entering data or other information or answering prompts or filling in data fields, the user might choose a particular method of input based upon their preference, the availability of the input mode, or just the knowledge of what modes are available. However, with the multiple numbers of modes available in a multimodal system, inefficiencies can be created as a user tries unsuccessfully to produce a valid response. For example, a user might be trying to select a particular input from a list using voice, and may try to speak a particular response that makes sense to them, but which is not a valid selection. This can lead to a user miscategorizing the input, or simply not completing the necessary input or documentation to the system. Accordingly, it is further desirable to improve efficiencies within a multimodal system, such as a multimodal system using voice, so that a user can provide the necessary input data or otherwise address or answer a prompt.
The drawbacks in the prior art are addressed by the invention, and other advantages are provided by the invention, as set forth herein.