In modern production environments, it is increasingly desirable for human operators to be able to record data and to control electronic devices in a “hands-free” mode, typically via speech control. This typically entails the use of portable electronic voice-processing devices which can detect human speech, interpret the speech, and process the speech to recognize words, to record data, and/or to control nearby electronic systems.
Voice-driven systems typically include at least one microphone and at least one processor-based device (e.g., computer system) which is operated in response to human voice or spoken input, for instance spoken commands and/or spoken information.
There are numerous applications in which voice-driven systems may be employed. For instance, there are many applications where it is advantageous for a user to have their hands free to perform tasks other than operating a keyboard, keypad, mouse, trackball or other user input device. An example of one such application is a warehouse, where a user may need to handle items such as boxes while concurrently interacting with a processor-based device. Another example application is a courier or delivery person, who may be handling parcels or driving a vehicle while concurrently interacting with a processor-based device. An example of a further such application is a medical care provider, who may be using their hands during the performance of therapeutic or diagnostic medical services, while concurrently interacting with a processor-based device. There are of course numerous other examples of applications.
In many of these exemplary applications it is also advantageous or even necessary for the user to be mobile. For applications in which mobility is desirable, the user may wear a headset and a portable processor-based device (referred to below in this document at the speech recognition device 106, 300, or SRD). The headset typically includes at least one loud-speaker and/or microphone. The portable processor-based device typically takes the form of a wearable computer system. The headset is communicatively coupled to the portable processor-based device, for instance via a coiled wire or a wireless connection, for example, a Bluetooth connection. In some embodiments, the portable processor-based device may be incorporated directly into the headset.
In some applications, the portable processor-based device may in turn be communicatively coupled to a host or backend computer system (e.g., server computer). In many applications, two or more portable processor-based devices (clients) may be communicatively coupled to the host or backend computer system/server.
The server may function as a centralized computer system providing computing and data-processing functions to various users via respective portable processor-based devices and headsets. Such may, for example, be advantageously employed in an inventory management system in which a central or server computer system performs tracking and management; a plurality of users each wearing respective portable computer systems and headsets interface with the central or server computer system.
This client (headset)/server approach allows the user(s) to receive audible instructions and/or information from the server of the voice driven system. For instance, the user may: receive from the server voice instructions; may ask questions of the server; may provide to the server reports on progress of their assigned tasks; and may also report working conditions, such as inventory shortages, damaged goods or parcels; and/or the user may receive directions such as location information specifying locations for picking up or delivering goods.
Background Sounds
Voice driven systems are often utilized in noisy environments where various extraneous sounds interfere with voice or spoken input. For example, in a warehouse or logistics center environment, extraneous sounds are often prevalent, including for instance: public address announcements; conversations from persons which are not intended as input (that is, persons other than the user of the voice driven system); and/or the movement of boxes or pallets; noise from the operation of lift vehicles (e.g., forklifts), motors, compressors, and other nearby machinery. To be effective, voice driven systems need to distinguish between voice or speech as intended input and extraneous background sounds, including unwanted voices, which may otherwise be erroneously interpreted as desired speech from a headset-wearing user.
Sounds or noise associated with public address (PA) systems are particularly difficult to address. Public address systems are intentionally loud, so that announcements can be heard above other extraneous noise in the ambient environment. Therefore, it is very likely that a headset microphone will pick up such sounds. Additionally, public address system announcements are not unintelligible noise, but rather are typically human voice or spoken, thereby having many of the same aural qualities as voice or spoken input.
Therefore, there exists a need for a system and method for addressing extraneous sounds including background speech and PA system speech, in order to prevent those extraneous sounds from interfering with the desired operation of the voice driven systems.