The capture or recording of audio or audio video content is well known. Many handheld devices are equipped with both cameras and microphones configured to capture or record audio and/or audio video signals for storage or transmission. Furthermore such devices are more commonly being equipped with spatial audio capture technology. Spatial audio capture technology uses an array of microphones (two or more) for recording or capturing the audio environment. The captured audio signals are analysed to extract the spatial co-ordinates/positions of any relevant or dominant sources in the captured audio environment. The spatial co-ordinates can then be defined with regards or reference to the orientation of the capturing device and typically the orientation of the centre of the camera viewpoint. Typically the spatial co-ordinates of the audio sources relative to the orientation of the electronic device are in the form of an angle with respect to a device centre axis or axes defined by the camera orientation. These coordinates are then used in some situations to synthesize a stereo audio signal at a listening device. The synthesis involves imparting frequency and angle dependent inter aural time and level difference cues through a head related transfer function (HRTF) or head related impulse response (HRIR). These combinations of HRTF/HRIR values when replayed on a stereo headphone set allow the signal to represent an audio sound field which is perceptually similar to the recorded audio environment. Alternatively synthesis based on custom panning rules can be applied for replaying it on a multi-channel loudspeaker set up.
Often the presentation of spatial audio signals is performed in conjunction with a replay of a video feed captured by a camera on the device.
Such portable devices unlike tripod mounted apparatus are prone to translational and rotational motion while they are recording audio and video. These motions can be the result of motion of the person holding the device (produced such as by device ‘shake’, or movement such as walking, running, and changing hands), or by the motion of a vehicle on which the device is mounted while recording. The motion is unintentional and unavoidable but can result in an unpleasant video playback experience. Video stabilization to overcome such translational and rotational values for video in many commercially available video recorders as this motion is typically constrained to relatively small translational and rotational values.
The recorded audio signal is similarly affected by motion. Specifically any motion could generate an incorrect estimate of the positions of the sources in the audio environment. Furthermore the estimation could further assign an audio source an incorrect motion. For example stationary sources at a given coordinate would when affected by motion can lead to variations in the estimated positions which causes the replayed sound sources to ‘oscillate’ or ‘wobble’ around their position whilst the video image maintained stationary.