Virtual reality (VR) is an example of an immersive multimedia which involves displaying a virtual world within a user device, typically a headset worn by the user which has two screens (one for each eye) displaying part of the virtual world dependent on the position and orientation of the user detected using sensors. Headphones are also provided in the headset providing audio thereby adding to the virtual experience. The virtual world may be computer-generated from a real world space captured using a suitable camera and microphone system comprising an array of camera sensors and microphones oriented in respective directions. Nokia's OZO® device is one such capture device, providing both spatial video and audio signals for processing and rendering using suitable VR software on a computer system.
Spatial audio refers to playable audio data that exploits sound localisation. In a real world space there may be multiple audio sources. The location and movement of the audio sources is a parameter of the captured audio. In rendering the audio as spatial audio for playback such parameters are incorporated in the data using processing algorithms so that the listener is provided with an immersive and spatially oriented experience. Nokia's Spatial Audio Capture (SPAC) is an example technology for processing audio captured via a microphone array into spatial audio; that is audio with a spatial percept. Alternatively, or additionally, object-based audio can be created based using signals from a plurality of close-up microphones each of which is associated with a respective audio source in the real-world space the position of which can be determined. In both cases, the intention is to capture audio so that when it is rendered to a user the user will experience the sound field as if they are present at the location of the capture device.
A mixing phase of VR is when a creator, e.g. a director, makes certain changes to the captured video and/or audio data to create a desired user experience. A rendering phase of VR is when the captured and mixed data is made available in a form ready for consumption and interaction. A consumption phase of VR is when the user is viewing and/or listening to the virtual world content, e.g. when wearing a VR headset.
In the consumption phase of VR, or indeed any virtual space in which the audio has a spatial percept, the presence of multiple audio sources may overwhelm the user and/or may make it difficult to understand the immersive experience that the director intended to convey.