The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Ever since the introduction of sound with film, there has been a steady evolution of technology used to capture the creator's artistic intent for the motion picture sound track and to accurately reproduce it in a cinema environment. A fundamental role of cinema sound is to support the story being shown on screen. Typical cinema sound tracks comprise many different sound elements corresponding to elements and images on the screen, dialog, noises, and sound effects that emanate from different on-screen elements and combine with background music and ambient effects to create the overall audience experience. The artistic intent of the creators and producers represents their desire to have these sounds reproduced in a way that corresponds as closely as possible to what is shown on screen with respect to sound source position, intensity, movement and other similar parameters.
Traditional channel-based audio systems send audio content in the form of speaker feeds to individual speakers in a playback environment, such as stereo and 5.1 systems. The introduction of digital cinema has created new standards for sound on film, such as the incorporation of up to 16 channels of audio to allow for greater creativity for content creators, and a more enveloping and realistic auditory experience for audiences. The introduction of 7.1 surround systems has provided a new format that increases the number of surround channels by splitting the existing left and right surround channels into four zones, thus increasing the scope for sound designers and mixers to control positioning of audio elements in the theatre.
Expanding beyond traditional speaker feeds and channel-based audio as a means for distributing spatial audio is critical, and there has been considerable interest in a model-based audio description which holds the promise of allowing the listener/exhibitor the freedom to select a playback configuration that suits their individual needs or budget, with the audio rendered specifically for their chosen configuration.
To further improve the listener experience, playback of sound in virtual three-dimensional environments has become an area of increased research and development. The spatial presentation of sound utilizes audio objects, which are audio signals with associated parametric source descriptions of apparent source position (e.g., 3D coordinates), apparent source width, and other parameters. Object-based audio is increasingly being used for many current multimedia applications, such as digital movies, video games, simulators, and 3D video and is of particular importance in a home environment where the number of reproduction speakers and their placement is generally limited or constrained.
A next generation spatial audio format may consist of a mixture of audio objects and more traditional channel-based speaker feeds along with positional metadata for the audio objects. In a next generation spatial audio decoder, the channels are sent directly to their associated speakers if the appropriate speakers exist. If the full set of specified speakers does not exist, then the channels may be down-mixed to the existing speaker set. This is similar to existing legacy channel-based decoders. Audio objects are rendered by the decoder in a more flexible manner. The parametric source description associated with each object, such as a positional trajectory in 3D space, is taken as input along with the number and position of speakers connected to the decoder. The renderer then utilizes one or more algorithms, such as a panning law, to distribute the audio associated with each object across the attached set of speakers. This way, the authored spatial intent of each object is optimally presented over the specific speaker configuration.
When content is authored in a next generation spatial audio format, it may still be desirable to send this content in an existing legacy channel-based format so that it may be played on legacy audio systems. This involves downmixing the next generation audio format to the appropriate channel-based format (e.g., 5.1, 7.1, etc.). When generating channel-based downmixes from three-dimensional content, one of the main challenges is to preserve spatial coherence between the original mix and the downmix.
In order to support already deployed audio systems, it is desirable to render a next generation spatial audio format into a legacy channel-based format. However, when rendering spatial audio content into a legacy format, a portion of the original spatial information may be lost. For example, a 7.1 legacy format may contain only a stereo pair of front height channels in the height plane. Since this stereo pair can only convey motion to the left and right, all forward or backward motion of audio objects in the height plane is lost. In addition, any height objects positioned within the room are collapsed to the front, thus resulting in the loss of important creative content. When playing the original spatial audio content in a channel-based system, this loss of information is generally acceptable because of the limitations of the legacy surround sound environment. If, however, the down-mixed spatial audio content is to be played back through a spatial audio system, this lost information will likely cause a degradation of the playback experience.
What is needed, therefore, is a means to recover this lost spatial information when reproducing spatial audio converted to a legacy channel-based format for playback in a spatial audio environment.