This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Soundfield signals such as e.g. Ambisonics carry a representation of a sound field. The Ambisonics format is based on spherical harmonics decomposition of the soundfield. While the basic Ambisonics format or B-format uses spherical harmonics of order zero and one, the so-called Higher Order Ambisonics (HOA) uses also further spherical harmonics of at least 2nd order. That is, a HOA signal comprises different partial signals of different order N, such as a signal of order zero (W-channel, N=0), one or more signals of order one (N=1), one or more signals of order two (N=2) etc. A decoding process is required to obtain the individual loudspeaker signals. In order to synthesize audio scenes, panning functions that refer to the spatial loudspeaker arrangement are required for obtaining a spatial localization of the given sound source.
One task to be performed at the decoder side is setting up a replay level. As described in the prior art [1] and shown in FIG. 1, the amplifier gain Gl of each loudspeaker feed is set such that a digital full band pink noise input with −18 dBFSrms results in a Sound Pressure Level (SPL) of 78+/−5 dBA. In FIG. 1, a pink noise test signal is used to level adjust the sound pressure level of each loudspeaker 13 by adjusting the speaker amplification Gl in an amplifier 12, for each loudspeaker individually. A digital pink noise test signal is converted in a D/A converter 11 to an analog signal. SPL level adjustment in mixing and presentation venues and loudness level adjustment of content in the mixing room enables constant perceived loudness when switching between programs or items.
Content Loudness Level Calibration
If the replay levels of mixing facility and presentation venues are set-up in this manner, switching between items or programs should be possible without further level adjustments. For channel based content, this is simply achieved if the content is tuned to a pleasant loudness level at the mixing site. The reference for the pleasant listening level can either be the loudness of the whole item itself or an anchor signal.
Using the whole item itself as reference is useful for ‘short form content’, if the content is stored as a file. Besides adjustment by listening, a measurement of the loudness in Loudness Units Full Scale (LUFS) according to EBU R128 [2] can be used for loudness adjustment of the content. An alternative name for LUFS is ‘Loudness, K-weighted, relative to Full Scale’ from ITU-R BS.1770 [3] (1LUFS=1LKFS). Unfortunately, the solution in [2] only supports content for setups up to 5-channel surround. Loudness measures of 22-channel files, where all 22 channels are factored by equal channel weights of one, may correlate with perceived loudness, but there is no evidence or proof by thorough listing tests yet.
When using an anchor signal such as a dialog as a reference, the level is selected in relation to this signal. This is useful for ‘long form content’ such as film sound, live recordings and broadcasts. An additional requirement, extending the pleasant listening level, is intelligibility of the spoken word here.
Again besides an adjustment by listening, the content may be normalized related to a loudness measure, such as defined in ATSC A/85 [4]. First parts of the content are identified as anchor parts. Then a measure as defined in [3] is computed, or these signals and a gain factor to reach the target loudness is determined. The gain factor is used to scale the complete item. Unfortunately, again the maximum number of channels supported is restricted to five.
FIG. 2 from ITU-R BS.1770 [3] shows a loudness measure as used in EBU R128 [2] and ATSC A/85 [4]. [2] proposes to gain adjust the measured loudness of the whole content item to −23 dBLKFS. In [4], only the anchor signal loudness is measured and the content is gain adjusted that the anchor parts reach a target loudness of −24 dBLKFS. Various input signals L,R,C,Ls,Rs are filtered in K-Filters 21, the power of each channel is averaged in power averagers 22, each channel is weighted 23 and the weighted signals are added up 24 to obtain a measured loudness value 25.
Out of artistic considerations, content has to be adjusted at the mixing studio. This is done by individual listening. Automatic loudness measures can be used as a support and for showing that a specified loudness is not exceeded.
For HOA and Audio Object (AO) based content, but also Channel Based content that has to be remixed to a different number or different position of loudspeakers, rendering has to be taken into account. There need to be special characteristics a renderer has to fulfill, and such a renderer has to be used at the mixing studio as well as at the presentation venue of the consumer.