While virtual reality (VR) opens up opportunities for content creators, and sports, entertainment and game broadcasters, it also brings new challenges when attempting to deliver immersive experiences to a broad base of users.
One of the most difficult challenges faced by the current VR industry is latency. For instance, video latency of more than 50 ms between a head movement and the resultant change in displayed images can lead to a detached gaming experience, and can also contribute to motion sickness and dizziness in a user. A VR system should ideally have a visual delay of less than 15 milliseconds (ms) to avoid the above issues. Similarly, audio latency can also play a major role in disrupting and breaking a user's immersion sensation. To ensure that a user feels connected to another person in real-time in VR, the audio delay between the speaker and the listener should be minimized. Studies of sensitivity to audio delay suggest that, for a user to speak comfortably with another person in a VR environment, one-way latency should be below 50 ms.
Human listeners can detect the difference between two sound sources that are placed as little as three degrees (3°) apart, about the width of a person at 10 meters. The ear on the far side of the head hears the sound slightly later than the near ear due to its greater distance from the source. Based on a typical head size (about 22 cm) and the speed of sound (about 340 m/s), an angular discrimination of 3° requires a timing precision of about 30 ms.
Geometric Acoustic (GA) modeling is the simulation of sound ray propagation in a particular spatial setting (e.g., a virtual scene setting), which can be executed by a GA processing pipeline, for example. Based on geometric information about the setting, GA processing can determine how the sound waves travel and bounce around the environment and reach a character or an object (e.g., which is controlled by a player in real-time), thus providing 3D spatialized audio data.
Typically, a geometric acoustic pipeline processes the geometry of a virtual scene along with knowledge of sound sources and receiver location by using a ray tracing algorithm and an audio processing algorithm. The ray tracing algorithm is used to compute a spatial acoustic model and generate impulse responses (IRs) that encode the delays and attenuation of sound waves traveling from a sound source to a sound receiver through different propagation paths representing transmission, reflection, and diffraction. Rays (or sound waves) are traced to generate an impulse response which represents the decay of audio energy in time at the place of the listener. Whenever the sound source, the receiver, or the objects in the scene moves, these propagation paths need to be recomputed—sometimes periodically. The audio processing algorithm is used to generate audio signals by convolving the input audio signals with the IRs. In a virtual environment such as in a game, where the geometry of the scene is known along with the positions of sound sources and a listener, Geometric Acoustics is applied to generate spatialized audio data.
In a typical client-server gaming system, a remote server performs game execution and renders on behalf of the clients that simply send input and display output frames. For providing GA audio data to the client, whenever a listening player's head moves, the new location and head position of the player is sent to the server to compute the IRs for this new position of the listener, the server then convolves the generated IRs with audio data before streaming the resultant audio frames to client. However, ray-tracing typically requires intensive computation and real-time computing; as a result, applying and rendering audio frames in this manner could significantly increase the end-to-end audio latency beyond comfortable levels.