Headsets are a popular way for a user to listen to music or audio privately, or to make a hands-free phone call, or to deliver voice commands to a voice recognition system. A wide range of headset form factors, i.e. types of headsets, are available, including earbuds, on-ear (supraaural), over-ear (circumaural), neckband, pendant, and the like. Several headset connectivity solutions also exist including wired analog, USB, Bluetooth, and the like. For the consumer it is desirable to have a wide range of choice of such form factors, however there are numerous audio processing algorithms which depend heavily on the geometry of the device as defined by the form factor of the headset and the precise location of microphones upon the headset, whereby performance of the algorithm would be markedly degraded if the headset form factor differs from the expected geometry for which the algorithm has been configured.
The voice capture use case refers to the situation where the headset user's voice is captured and any surrounding noise is minimised. Common scenarios for this use case are when the user is making a voice call, or interacting with a speech recognition system. Both of these scenarios place stringent requirements on the underlying algorithms. For voice calls, telephony standards and user requirements demand that high levels of noise reduction are achieved with excellent sound quality. Similarly, speech recognition systems typically require the audio signal to have minimal modification, while removing as much noise as possible. Numerous signal processing algorithms exist in which it is important for operation of the algorithm to change in response to whether or not the user is speaking. Voice activity detection, being the processing of an input signal to determine the presence or absence of speech in the signal, is thus an important aspect of voice capture and other such signal processing algorithms. However voice capture is particularly difficult to effect with a generic algorithm architecture.
There are many algorithms that exist to capture a headset user's voice, however such algorithms are invariably designed and optimised specifically for a particular configuration of microphones upon the headset concerned, and for a specific headset form factor. Even for a given form factor, headsets have a very wide range of possible microphone positions (microphones on each ear, whether internal or external to the ear canal, multiple microphones on each ear, microphones hanging around neck, and so on). FIG. 1 shows some examples of the many possible microphone positions that may each require a voice capture function. In FIG. 1 the black dots signify microphones that are present in a particular design, while the open circles indicate unused microphone locations. As can be seen, with such a proliferation of form factors and available microphone positions, the number of voice capture solutions that need to be developed and tested can quickly become difficult to manage. Likewise, tuning can become very difficult, as each solution may need to be tuned in a different way and require highly skilled engineer time, increasing costs.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Throughout this specification the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
In this specification, a statement that an element may be “at least one of” a list of options is to be understood that the element may be any one of the listed options, or may be any combination of two or more of the listed options.