Audio and audio-video recording on electronic apparatus is now common. Devices ranging from professional video capture equipment, consumer grade camcorders and digital cameras to mobile phones and even simple devices as webcams can be used for electronic acquisition of motion video images. Recording video and the audio associated with video has become a standard feature on many mobile devices and the technical quality of such equipment has rapidly improved. Recording personal experiences using a mobile device is quickly becoming an increasingly important use for mobile devices such as mobile phones and other user equipment. Combining this with the emergence of social media and new ways to efficiently share content underlies the importance of these developments and the new opportunities offered for the electronic device industry.
In conventional situations the environment comprises sound fields with audio sources spread in all three spatial dimensions. The human hearing system controlled by the brain has evolved the innate ability to localize, isolate and comprehend these sources in the three dimensional sound field. For example the brain attempts to localize audio sources by decoding the cues that are embedded in the audio wavefronts from the audio source when the audio wavefront reaches our binaural ears. The two most important cues responsible for spatial perception is the interaural time differences (ITD) and the interaural level differences (ILD). For example an audio source located to the left and front of the listener takes more time to reach the right ear when compared to the left ear. This difference in time is called the ITD. Similarly, because of head shadowing, the wavefront reaching the right ear gets attenuated more than the wavefront reaching the left ear, leading to ILD. In addition, transformation of the wavefront due to pinna structure, shoulder reflections can also play an important role in how we localize the sources in the 3D sound field. These cues therefore are dependent on person/listener, frequency, location of audio source in the 3D sound field and environment he/she is in (for example the whether the listener is located in an anechoic chamber/auditorium/living room).
Audio-video recordings are well known in implementation. Often recording or capture is carried out in environmentally noisy situations where background noise causes difficulty in understanding detail that has been recorded. This typically results in requests to repeat the recording to determine the detail. This is particularly acute in recording conversation where it can be difficult to follow the discussion due to local noise causing serve distraction. Even where the surrounding or environmental noise does not prevent the user from understanding the detail in the recording it can still be very distracting and annoying and requiring extra effort in listening.