In modern production environments, it is increasingly desirable for human operators to be able to record data and to control electronic devices in a “hands-free” mode, typically via speech control. This typically entails the use of portable electronic voice-processing devices which can detect human speech, interpret the speech, and process the speech to recognize words, to record data, and/or to control nearby electronic systems.
Voice-driven systems typically include at least one microphone and at least one processor-based device (e.g., computer system) which is operated in response to human voice or spoken input, for instance spoken commands and/or spoken information.
There are numerous applications in which voice-driven systems may be employed. For instance, there are many applications where it is advantageous for a user to have their hands free to perform tasks other than operating a keyboard, keypad, mouse, trackball or other user input device. An example of one such application is a warehouse, where a user may need to handle items such as boxes while concurrently interacting with a processor-based device. Another example application is a courier or delivery person, who may be handling parcels or driving a vehicle while concurrently interacting with a processor-based device. Yet another example application is a medical care provider, who may be using their hands during the performance of therapeutic or diagnostic medical services, while concurrently interacting with a processor-based device. There are of course numerous other examples of applications.
In many of these exemplary applications it is also advantageous or even necessary for the user to be mobile. For applications in which mobility is desirable, the user may wear a headset and a portable processor-based device. The headset typically includes at least one loud-speaker and/or microphone. The portable processor-based device typically takes the form of a wearable computer system. The headset is communicatively coupled to the portable processor-based device, for instance via a coiled wire or a wireless connection, for example, a Bluetooth connection.
In some applications, the portable processor-based device may in turn be communicatively coupled to a host or backend computer system (e.g., server computer). In many applications, two or more portable processor-based devices (clients) may be communicatively coupled to the host or backend computer system/server.
The server may function as a centralized computer system providing computing and data-processing functions to various users via respective portable processor-based devices and headsets. Such may, for example, be advantageously employed in an inventory management system in which a central/server computer system performs tracking and management; a plurality of users each wearing respective portable computer systems and headsets interface with the central or server computer system.
This client (headset)/server approach allows the user(s) to receive audible instructions and/or information from the server of the voice driven system. For instance, the user may: receive voice instructions from the server; may ask questions of the server; may provide to the server reports on progress of their assigned tasks; and may also report working conditions, such as inventory shortages, damaged goods or parcels; and/or the user may receive directions such as location information which specifies factory (or warehouse) locations for picking up or delivering goods.
Background Sounds:
Voice driven systems are often utilized in noisy environments where various extraneous sounds interfere with voice or spoken input. For example, in a warehouse or logistics center environment, extraneous sounds are often prevalent, including for instance: public address announcements; conversations from persons which are not intended as input; sounds from the movement of boxes or pallets; noise from the operation of lift vehicles (e.g., forklifts); impulse sounds, i.e., relatively sharp, sudden sounds as may arise from dropped objects, slammed doors, and other brief-but-loud sound events; and noises from the operations of other machines, including electric motor noises, compressor sounds, and similar.
To be effective, voice driven systems need to distinguish between voice or speech as intended input versus extraneous background sounds (including but not limited to unwanted voices) which may otherwise be erroneously interpreted as desired speech from a headset-wearing user.
In the past, there have been two primary methods for rejecting background noise to the speech detector. In a first method, a noise cancelling microphone was used which would reject sound directionally. A second method would employ multiple microphones, typically with all the microphones mounted on the user's headset or person (i.e., body microphones).
For example, Honeywell's existing Vocollect Soundsense SRX2 product enables a multi-microphone input to the speech detector that allows better rejection of ambient noise and impulses that would cause insertion. Unfortunately, the SoundSense SRX2 can only be run on specialized hardware. Further, the SRX2 and similar technologies are typically limited to microphones that are on the person of the user, rather than employing microphones that are distributed throughout the work environment.
Therefore, there exists a need for an improved system and method for addressing extraneous environmental sounds, in order to prevent those extraneous sounds from interfering with the desired operation of the voice driven systems.