Audio functionality is becoming increasingly prevalent in portable devices. Such functionality is present not only in devices such as phones that are reliant audio technology, but also in other wearable equipment or devices that may be controlled by voice, for instance voice-responsive toys such as listening-talking teddy bears. Such devices, including phones, will spend little of their time actually transmitting speech, yet one or possibly more microphones may be permanently enabled listening out for some voice command. Even a wearable accessory may be continuously on, awaiting a voice command, and will have little space for a battery, or may rely on some solar or mechanical energy harvesting, and so has severe power consumption requirements in a continuous standby mode as well as in a low-duty-cycle operating mode.
Microphone transducer and amplifier technology has improved, but generally a microphone package needs to drive its output signal some distance. Digital transmission offers advantages including noise immunity, but the usual formats for transmission of digital data from microphones are not particularly efficient in terms of signal line activity and the consequent power consumed in charging parasitic capacitances though a supply voltage at every logic level transition.
In a portable device such as a phone or tablet, containing one or more digital microphone, the digital microphone signal may have some distance to go from the microphone to a centralised smart codec chip or such, along a ribbon cable, or flex, or even across a densely populated printed circuit board. Even worse are applications where the microphone may be in a headset or earbuds or some acoustically desirable position on the user's clothing, distant from the handset or the main module of a distributed device.
However, even when largely otherwise inactive, there may be sophisticated signal processing to be performed, for example speaker recognition during voice-triggered wake-up, so solutions such as grossly degrading the resolution of the ADC therein may lead to unacceptable downstream processing results.
There is thus a requirement to reduce the power consumed in sending digital microphone data across a wired digital transmission link, while still conveying enough useful information in the transmitted signal to allow downstream function such as speech recognition.
FIG. 1 illustrates a conventional digital microphone 10 communicating with a smart codec 22 in a host device 20, for example a phone, and FIG. 2 illustrates the operating waveforms in a conventional digital microphone interface. A host device 20 transmits a clock CLK, typically at a frequency such as 3 MHz, to the microphone 10, which uses this to clock an ADC 12 and to clock out from digital buffer interface Dout 14 a 1-bit oversampled delta-sigma stream DAT representing the acoustic signal input Px to the microphone transducer 16 providing the ADC input. Power is consumed in the system by the host 20 transmitting this clock signal CLK, and in particular by the microphone in sending a data stream DAT with an average 1.5 MHz transition rate.
Power may be reduced by operating at a lower clock rate, say 768 kHz, but this greatly increase the in-band quantisation noise or conversely restricts the usable bandwidth for a particular noise level. Even this only reduces the power by a factor of 4, so the power consumption is still significant, particularly in larger form factor devices or long cable runs.
Transmitting a delta-sigma stream is notably less efficient in terms of data bit rate and transition rate than transmitting a serial multi-bit pulse-code-modulated stream, but the latter generally requires an additional clock wire to transmit clocks to mark the start of each multi-bit word.
Secondly we note that an unfortunate side effect of reducing the delta-sigma sample clock rate may be to limit the bandwidth usable in terms of background quantisation noise to say 8 kHz rather than say 20 kHz. This may increase the word error rate (WER) for Voice Key Word detection (VKD). This may in turn lead to a higher incidence of false positives and the system may spend more time in its awake mode thus significantly affecting the average complete system power consumption.
Additionally there is also a prevalent requirement for functions requiring even more accurate input audio data streams, such as speaker identification, as part of a voice-triggered wake-up function. It is known that using a wider bandwidth for the speaker identification captures more speech signal components and thus relaxes the need for high signal-to-noise (SNR) (e.g. relaxes the need for low acoustic background noise, or carefully optimised microphone placement) to get high enough accuracy for biometric purposes. Even in a high SNR environment a relatively wide signal bandwidth may improve speaker verification accuracy. This is at odds with the concept of reducing the frequency of the digital microphone clock to reduce the power consumption.