Immersive sound fields (i.e., 3D audio) can be represented in B-format audio (i.e., ambisonics) or in an object-audio format (vector base amplitude panning (VBAP)). Immersive sound fields can be represented by “panning” a mono audio source in 3D space using two angles (i.e., theta and phi) or by acquiring a sound field using microphones designed to capture sound fields. Ambisonics uses at least four audio channels (B-format audio) to encode an entire 360° sound sphere. Object-audio uses mono audio “objects” with associated metadata indicating a position to a proprietary renderer (e.g., Dolby Atmos).
A spherical video (or immersive video) can be represented in various formats. Spherical video can be represented using 2D equirectangular projections, using cubic projections, through a head-mounted display (i.e., an Oculus Rift, HTC Vive, etc.), or using other projections. Projections map a point of the spherical video (defined in terms of X/Y/Z coordinates, or in terms of longitude and latitude angles) to a 2D point (X and Y) in the projected view. A point in a 2D projected view (i.e., equirectangular or cubic views) directly relates to a 3D point on the sphere.
When recording an immersive video and a sound field, there can be situations where the video and audio acquisition devices are separated (i.e., when the microphones are not integrated into the spherical camera). The video and audio acquisition devices can be placed manually in an environment to capture a scene. Generally, a good practice is to place audio and video acquisition devices close to each other. When video and audio acquisition devices are separated, the coordinate system axes of the sound field and the immersive video are not necessarily aligned.
When the sound field and the immersive video are not aligned, there can be a mismatch between what is seen by a viewer and what is heard. When the audio played back does not match the corresponding audio source, the viewer does not have an immersive experience. Conventional methods available to content creators for rotating sound fields are not intuitive.
Audio software plugins are available that allow a content creator to rotate a sound field by interacting with a user interface that shows a 2D orthographic projection of the sound sphere. Interacting with the sound sphere is not intuitive because the audio is completely separate from the video. Without visual feedback, aligning the sound field to the immersive video is cumbersome for a content creator because the content creator needs to rotate the sound field and then verify an orientational alignment with the immersive video by playing the video with immersive sound and carefully judging by ear whether the audio and video are aligned.
It would be desirable to implement a method for aligning an immersive video and an immersive sound field.