The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Ever since the introduction of sound with film, there has been a steady evolution of technology used to capture the creator's artistic intent for the motion picture sound track and to accurately reproduce it in a cinema environment. A fundamental role of cinema sound is to support the story being shown on screen. Typical cinema sound tracks comprise many different sound elements corresponding to elements and images on the screen, dialog, noises, and sound effects that emanate from different on-screen elements and combine with background music and ambient effects to create the overall audience experience. The artistic intent of the creators and producers represents their desire to have these sounds reproduced in a way that corresponds as closely as possible to what is shown on screen with respect to sound source position, intensity, movement and other similar parameters.
Current cinema authoring, distribution and playback suffer from limitations that constrain the creation of truly immersive and lifelike audio. Traditional channel-based audio systems send audio content in the form of speaker feeds to individual speakers in a playback environment, such as stereo and 5.1 systems. The introduction of digital cinema has created new standards for sound on film, such as the incorporation of up to 16 channels of audio to allow for greater creativity for content creators, and a more enveloping and realistic auditory experience for audiences. The introduction of 7.1 surround systems has provided a new format that increases the number of surround channels by splitting the existing left and right surround channels into four zones, thus increasing the scope for sound designers and mixers to control positioning of audio elements in the theatre.
To further improve the listener experience, playback of sound in virtual three-dimensional environments has become an area of increased research and development. The spatial presentation of sound utilizes audio objects, which are audio signals with associated parametric source descriptions of apparent source position (e.g., 3D coordinates), apparent source width, and other parameters. Object-based audio is increasingly being used for many current multimedia applications, such as digital movies, video games, simulators, and 3D video.
Expanding beyond traditional speaker feeds and channel-based audio as a means for distributing spatial audio is critical, and there has been considerable interest in a model-based audio description which holds the promise of allowing the listener/exhibitor the freedom to select a playback configuration that suits their individual needs or budget, with the audio rendered specifically for their chosen configuration. At a high level, there are four main spatial audio description formats at present: speaker feed in which the audio is described as signals intended for speakers at nominal speaker positions; microphone feed in which the audio is described as signals captured by virtual or actual microphones in a predefined array; model-based description in which the audio is described in terms of a sequence of audio events at described positions; and binaural in which the audio is described by the signals that arrive at the listeners ears. These four description formats are often associated with the one or more rendering technologies that convert the audio signals to speaker feeds. Current rendering technologies include panning, in which the audio stream is converted to speaker feeds using a set of panning laws and known or assumed speaker positions (typically rendered prior to distribution); Ambisonics, in which the microphone signals are converted to feeds for a scalable array of speakers (typically rendered after distribution); WFS (wave field synthesis) in which sound events are converted to the appropriate speaker signals to synthesize the sound field (typically rendered after distribution); and binaural, in which the L/R (left/right) binaural signals are delivered to the L/R ear, typically using headphones, but also by using speakers and crosstalk cancellation (rendered before or after distribution). Of these formats, the speaker-feed format is the most common because it is simple and effective. The best sonic results (most accurate, most reliable) are achieved by mixing/monitoring and distributing to the speaker feeds directly since there is no processing between the content creator and listener. If the playback system is known in advance, a speaker feed description generally provides the highest fidelity. However, in many practical applications, the playback system is not known. The model-based description is considered the most adaptable because it makes no assumptions about the rendering technology and is therefore most easily applied to any rendering technology. Though the model-based description efficiently captures spatial information it becomes very inefficient as the number of audio sources increases.
For many years, cinema systems have featured discrete screen channels in the form of left, center, right and occasionally ‘inner left’ and ‘inner right’ channels. These discrete sources generally have sufficient frequency response and power handling to allow sounds to be accurately placed in different areas of the screen, and to permit timbre matching as sounds are moved or panned between locations. Recent developments in improving the listener experience attempt to accurately reproduce the location of the sounds relative to the listener. In a 5.1 setup, the surround ‘zones’ comprise of an array of speakers, all of which carry the same audio information within each left surround or right surround zone. Such arrays may be effective with ‘ambient’ or diffuse surround effects, however, in everyday life many sound effects originate from randomly placed point sources. For example, in a restaurant, ambient music may be played from apparently all around, while subtle but discrete sounds originate from specific points: a person chatting from one point, the clatter of a knife on a plate from another. Being able to place such sounds discretely around the auditorium can add a heightened sense of reality without being noticeably obvious. Overhead sounds are also an important component of surround definition. In the real world, sounds originate from all directions, and not always from a single horizontal plane. An added sense of realism can be achieved if sound can be heard from overhead, in other words from the ‘upper hemisphere.’ Present systems, however, do not offer truly accurate reproduction of sound for different audio types in a variety of different playback environments. A great deal of processing, knowledge, and configuration of actual playback environments is required using existing systems to attempt accurate representation of location specific sounds, thus rendering current systems impractical for most applications.
What is needed is a system that supports multiple screen channels, resulting in increased definition and improved audio-visual coherence for on-screen sounds or dialog, and the ability to precisely position sources anywhere in the surround zones to improve the audio-visual transition from screen to room. For example, if a character on screen looks inside the room towards a sound source, the sound engineer (“mixer”) should have the ability to precisely position the sound so that it matches the character's line of sight and the effect will be consistent throughout the audience. In a traditional 5.1 or 7.1 surround sound mix, however, the effect is highly dependent on the seating position of the listener, which is disadvantageous for most large-scale listening environments. Increased surround resolution creates new opportunities to use sound in a room-centric way as opposed to the traditional approach, where content is created assuming a single listener at the “sweet spot.”
Aside from the spatial issues, current multi-channel state of the art systems suffer with regard to timbre. For example, the timbral quality of some sounds, such as steam hissing out of a broken pipe, can suffer from being reproduced by an array of speakers. The ability to direct specific sounds to a single speaker gives the mixer the opportunity to eliminate the artifacts of array reproduction and deliver a more realistic experience to the audience. Traditionally, surround speakers do not support the same full range of audio frequency and level that the large screen channels support. Historically, this has created issues for mixers, reducing their ability to freely move full-range sounds from screen to room. As a result, theatre owners have not felt compelled to upgrade their surround channel configuration, preventing the widespread adoption of higher quality installations.