1. Technical Field
This application relates generally to voice-driven systems, and more specifically to analysis of sounds in detecting and/or recognizing speech for use with or in voice-driven systems.
2. Description of the Related Art
Voice-driven systems typically include at least one microphone and at least one processor-based device (e.g., computer system) which is operated in response to human voice or spoken input, for instance spoken commands and/or information.
There are numerous applications in which voice-driven systems may be employed. For instance, there are many applications where it is advantageous for a user to have their hands free to perform tasks other than operating a keyboard, keypad, mouse, trackball, joystick or other user input device. An example of one such application is a warehouse, where a user may need to handle items such as boxes while concurrently interacting with a processor-based device. An example of another such application is a courier or delivery person, who may be handling parcels or driving a vehicle while concurrently interacting with a processor-based device. An example of a further such application is a medical care provider, who may be using their hands during the performance of therapeutic or diagnostic medical services, while concurrently interacting with a processor-based device. There are of course numerous other examples of applications.
In many of these exemplary applications, as well as other applications, it is also advantageous or even necessary for the user to be mobile. For applications in which mobility is desirable, the user may wear a headset and a portable processor-based device. The headset typically includes at least one speaker and/or microphone. The portable processor-based device typically takes the form of a wearable computer system. The headset is communicatively coupled to the portable processor-based device, for instance via a coiled wire.
In some applications, the portable processor-based device may in turn be communicatively coupled to a host or backend computer system (e.g., server computer). In many applications, two or more portable processor-based devices may be communicatively coupled to the host or backend computer system, which may function as a centralized computer system or server providing the computing and data-processing functions to various users via respective portable processor-based devices and headsets. Such may, for example, be advantageously employed in an inventory management system in which a central or server computer system performs tracking and management, a plurality of users each wearing respective portable computer systems and headsets interface with the central or server computer system. This approach allows the user(s) to provide spoken or voice input to the voice driven system, including commands and/or information. This approach also allows the user(s) to receive audible instructions and/or information from the voice driven system. For instance, the user may receive voice instructions, ask questions, provide reports on progress of their assigned tasks, report working conditions, such as inventory shortages, damaged goods or parcels, and/or receive directions such as location information specifying locations for picking up or delivering goods.
Voice driven systems are often utilized in noisy environments where various extraneous sounds interfere with voice or spoken input. For example, in a warehouse or logistics center environment, extraneous sounds are often prevalent, including, for instance the movement of boxes or pallets, noise from the operation of lift vehicles (e.g., forklifts), public address announcements, and/or conversations which are not intended as input. To be effective, voice driven systems need to distinguish been voice or speech as intended input and extraneous sounds which may otherwise be interpreted as actual speech from a headset-wearing user. Sounds or noise associated with public address systems are particularly difficult to address. Public address systems are intentionally loud, so that announcements can be heard above other extraneous noise in the ambient environment. Therefore, it is very likely that a headset microphone will pick up such sounds. Additionally, public address system announcements are not unintelligible noise, but rather are typically human voice or spoken, thereby having many of the same aural qualities as voice or spoken input.
There is a particular need for addressing extraneous sounds such as noise in an environment using voice driven systems to prevent those extraneous sounds from interfering with the desired operation of the voice driven systems. The approaches described herein may adequately address these extraneous sounds.