3D Audio may be realised using a sound field description by a technique called Higher Order Ambisonics (HOA) as described below. Storing HOA data requires some conventions and stipulations how this data must be used by a special decoder to be able to create loudspeaker signals for replay at a given reproduction speaker setup. No existing storage format defines all of these stipulations for HOA. The B-Format (based on the extensible ‘Riff/way’ structure) with its *.amb file format realisation as described as of 30 Mar. 2009 for example in Martin Leese, “File Format for B-Format”, http://www.ambisonia.com/Members/etienne/Members/mleese/flle-format-for-b-format, is the most sophisticated format available today. The .amb file format was presented in 2000 by R. W. Dobson, “Developments in Audio File Formats”, at ICMC Berlin 2000.
As of 16 Jul. 2010, an overview of existing file formats is disclosed on the Ambisonics Xchange Site: “Existing formats”, http://ambisonics.iem.at/xchange/format/existing-formats, and a proposal for an Ambisonics exchange format is also disclosed on that site: “A first proposal to specify, define and determine the parameters for an Ambisonics exchange format”, http://ambisonics.iem.at/xchange/format/a-first-proposal-for-the-format.
Invention
Regarding HOA signals, for 3D a collection of M=(N±1)2 ((2N+1) for 2D) different Audio objects from different sound sources, all at the same frequency, can be recorded (encoded) and reproduced as different sound objects provided they are spatially even distributed. This means that a 1st order Ambisonics signal can carry four 3D or three 2D Audio objects and these objects need to be separated uniformly around a sphere for 3D or around a circle in 2D. Spatial overlapping and more then M signals in the recording will result blur—only the loudest signals can be reproduced as coherent objects, the other diffuse signals will somehow degenerate the coherent signals depending on the overlap in space, frequency and loudness similarity.
Regarding the acoustic situation in a cinema, high spatial sound localisation accuracy is required for the frontal screen area in order to match the visual scene. Perception of the surrounding sound objects is less critical (reverb, sound objects with no connection to the visual scene). Here the density of speakers can be smaller compared to the frontal area.
The HOA order of the HOA data, relevant for frontal area, needs to be large to enable holophonic replay at choice. A typical order is N=10. This requires (N+1)2=121 HOA coefficients. In theory we could encode also M=121 audio objects, if this audio objects would be evenly spatially distributed. But in our scenario they are constricted to the frontal area (because only here we need such high orders). In fact we can only code about M=60 Audio objects without blur (the frontal area is at most half a sphere of directions, thus M/2).
Regarding the above-mentioned B-Format, it enables a description only up to an Ambisonics order of 3, and the file size is restricted to 4 GB. Other special information items are missing, like the wave type or the reference decoding radius which are vital for modern decoders. It is not possible to use different sample formats (word widths) and bandwidths for the different Ambisonics components (channels). There is also no standardisation for storing side information and metadata for Ambisonics.
In the known art, recording Ambisonics signals using a microphone array is restricted to orders of one. This might change in the future if experimental prototypes of HOA microphones will be developed. For the creation of 3D content a description of the ambience sound field could be recorded using a microphone array in first order Ambisonics, whereby the directional sources are captured using close-up mono microphones or highly directional microphones together with directional information (i.e. the position of the source). The directional signals can then be encoded into a HOA description, or this might be performed by a sophisticated decoder. Anyhow, a new Ambisonics file format needs to be able to store more than one sound field description at once, but it appears that no existing format can encapsulate more than one Ambisonics description.
A problem to be solved by the invention is to provide an Ambisonics file format that is capable of storing two or more sound field descriptions at once, wherein the Ambisonics order can be greater than 3. This problem is solved by the data structure disclosed in claim 1 and the method disclosed in claim 12.
For recreating realistic 3D Audio, next-generation Ambisonics decoders will require either a lot of conventions and stipulations together with stored data to be processed, or a single file format where all related parameters and data elements can be coherently stored.
The inventive file format for spatial sound content can store one or more HOA signals and/or directional mono signals together with directional information, wherein Ambisonics orders greater than 3 and files >4 GB are feasible. Furthermore, the inventive file format provides additional elements which existing formats do not offer:    1) Vital information required for next-generation HOA decoders is stored within the file format:            Ambisonics wave information (plane, spherical, mixture types), region of interest (sources outside the listening area or within), and reference radius (for decoding of spherical waves)        Related directional mono signals can be stored. Position information of these directional signals can be described either using angle and distance information or an encoding vector of Ambisonics coefficients.            2) All parameters defining the Ambisonics data are contained within the side information, to ensure clarity about the recording:            Ambisonics scaling and normalisation (SN3D, N3D, Furse Malham, B Format, . . . , user defined), mixed order information.            3) The storage format of Ambisonics data is extended to allow for a flexible and economical storage of data:            The inventive format allows storing data related to the Ambisonics order (Ambisonics channels) with different PCM-word size resolution as well as using restricted bandwidth.            4) Meta fields allow storing accompanying information about the file like recording information for microphone signals:            Recording reference coordinate system, microphone, source and virtual listener positions, microphone directional characteristics, room and source information.        
This file format for 2D and 3D audio content covers the storage of both Higher Order Ambisonics descriptions (HOA) as well as single sources with fixed or time-varying positions, and contains all information enabling next-generation audio decoders to provide realistic 3D Audio.
Using appropriate settings, the inventive file format is also suited for streaming of audio content. Thus, content-dependent side info (header data) can be sent at time instances as selected by the creator of the file. The inventive file format serves also as scene description where tracks of an audio scene can start and end at any time.
In principle, the inventive data structure is suited for Higher Order Ambisonics HOA audio data, which data structure includes 2D and/or 3D spatial audio content data for one or more different HOA audio data stream descriptions, and which data structure is also suited for HOA audio data that have on order of greater than ‘3’, and which data structure in addition can include single audio signal source data and/or microphone array audio data from fixed or time-varying spatial positions.
In principle, the inventive method is suited for audio presentation, wherein an HOA audio data stream containing at least two different HOA audio data signals is received and at least a first one of them is used for presentation with a dense loudspeaker arrangement located at a distinct area of a presentation site, and at least a second and different one of them is used for presentation with a less dense loudspeaker arrangement surrounding said presentation site.
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.