This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Multiple microphones can be used to capture efficiently audio events. However, often it is difficult to convert the captured signals into a form such that the listener can experience the event as if being present in the situation in which the signal was recorded. Particularly, the spatial representation tends to be lacking, i.e., the listener does not sense the directions of the sound sources, as well as the ambience around the listener, identically as if he or she was in the original event.
One way to improve the spatial representation is by processing the multiple microphone signals into binaural signals. By using stereo headphones, the listener can (almost) authentically experience the original event upon playback of binaural recordings. Another way to improve the spatial representation is by processing the multiple microphone signals into multi-channel signals, such as 5.1 channels. Usually processing is possible to either binaural signals or multi-channel signals, but not both. Recently, however, it has become possible to process multiple microphone signals into either binaural signals or multi-channel signals, depending on user preference. Thus, a user has more control over how microphone signals should be processed.
In terms of taking audio signals from multiple microphones and creating multi-channel outputs, this was originally performed by creating the multiple channel outputs from the audio signals. For instance, sound engineers mixed audio signals to create 5.1 channels (where the “0.1” represents a sixth channel for low frequency effects), and those channels corresponded directly to the 5.1 multi-channel outputs. Thus, if binaural sound was desired, those 5.1 channels had to be processed into binaural channel outputs. Recently, however, there has been a trend toward creating more flexible audio formats. The term “flexible audio format” is used herein to express that a sound format can be rendered with any number of loudspeakers or with headphones. An example of these flexible audio formats is presented in Wiggins, B., “An Investigation into the Real-time Manipulation and Control of Three-dimensional Sound Fields”, PhD thesis, University of Derby, Derby, UK (2004), which defines a “hierarchical” sound format as a format from which channels can be ignored resulting in less localization accuracy or added resulting in higher localization accuracy. Another example is Dolby Atmos, which is a new flexible audio format that creates flexibility with sound objects. More objects means a more complete sound scene, fewer objects means a less complete sound scene. Although exact details of Dolby Atmos have not been released, the company has released some information. In particular, according to the “Dolby Atmos Next-Generation Audio for Cinema”, white paper:
“Audio objects can be considered as groups of sound elements that share the same physical location in the auditorium. Objects can be static or they can move. They are controlled by metadata that, among other things, details the position of the sound at a given point in time. When objects are monitored or played back in a theater, they are rendered according to the positional metadata using the speakers that are present, rather than necessarily being output to a physical channel.”
According to the white paper, up to 128 tracks (e.g., each track corresponding to one or more microphone signals) can be processed into channel information (referred to as “beds”) and into the previously described audio objects and corresponding positional metadata. The “beds” channel information may be added to the information from the audio objects. One use according to the white paper for the “beds” channel information is for ambient effects or reverberations.
In the mobile world, audio is often played back over many different kinds of speaker setups: mobile device integrated speakers, headphones, home speakers through a docking station, and the like. Therefore, a flexible audio format has great benefits in the mobile world. Unfortunately flexible audio formats usually require more bits to store and to transmit and in the mobile world there is less bandwidth and storage space available as compared to home or commercial locations. In particular, Dolby Atmos will consume a large amount of bandwidth. Therefore solutions that reduce the necessary bandwidth for flexible audio formats are very beneficial.