This specification relates generally to robots, and more particularly to audio processing for consumer robots.
A robot is a physical machine configured to perform physical actions autonomously or semi-autonomously. Robots have one or more integrated control subsystems that effectuate the physical movement of one or more robotic components in response to particular inputs. Robots can also have one or more integrated sensors that allow them to detect particular characteristics of the robot's environment. Modern day robots are typically electronically controlled by dedicated electronic circuitry, programmable special-purpose or general-purpose processors, or some combination of these. Robots can also have integrated networking hardware that allows the robot to communicate over one or more communications networks, e.g., over Bluetooth, NFC, or Wi-Fi.
A number of devices rely on microphones to detect the presence of a nearby sound, and have a need to localize the source of that sound so that they can focus on and process that particular sound versus other ambient noise, e.g. because that sound is a voice command coming from a user seeking to interact with the device. In many cases, these devices have no indication of the direction from which to expect an audio input, and that audio input can come from any location or multiple locations in the environment. Thus, these devices often initially listen in all directions, for example, by making use of an array of omnidirectional microphones. Doing so, allows a device to determine the correct direction of a sound source so the device can, for example, best isolate it. One method of determining a direction of an emitter is to compare the arrival times of a signal across a microphone array. If a microphone A detects a sound wave before a microphone B, it can be extrapolated that the emitter of the sound wave is physically closer to microphone A than it is to microphone B.
After determining a likely direction of the emitter, some devices proceed to focus their microphones in the determined direction of the emitter to reduce the effects of ambient noise on the sound signal. Spatial filtering refers generally to signal processing techniques for this task, and can be performed on a system that includes a processor and a microphone array. Each microphone in the array receives a version of the emitted signal that is different from that received by its neighbors, due to each microphone's unique position relative to the emitter. A device can then generate a spatial filter by applying weighted and time-shifted summations of the different versions of the signal generated by the microphones. This allows the device to strengthen signals received in the direction of the emitter, e.g., by using constructive interference. Similarly, the system can also reduce the effects of noise, e.g., by using destructive interference.
Devices can also use acoustic transfer functions (hereafter, “transfer functions”) to improve the signal quality of received audio signals. A transfer function represents how an audio signal is transformed between two locations in a particular environment, e.g., due to the acoustic properties of its medium. A device receiving an audio signal can apply an inverse transfer function to recover the original audio signal, e.g., removing distortion and noise.
Computing spatial filters is most practical for devices that do not move (such as smart speakers) or that move slowly. But for mobile robots that are capable of rapid movements, even spatial filters computed in real time are often ineffective. In other words, by the time a robot has performed the computations to generate a spatial filter, the location of the emitter relative to the robot can have already changed dramatically based on the movement of the robot itself (even assuming the emitter is stationary).