It is a common concept to embed audio metadata into a digital audio stream, e.g. in digital broadcast environments. Such metadata is “data about data”, i.e. data about the digital audio in the stream. The metadata can provide information to an audio decoder about how to reproduce the audio. One type of metadata is dynamic range control information which represents a time-varying gain envelope. Such dynamic range control metadata can serve multiple purposes:                (1) Control the dynamic range of reproduced audio: Digital transmission allows for a high dynamic range, but listening conditions do not always permit taking advantage of that. Although high dynamic range is desirable in quiet living room conditions, it may not be appropriate for other conditions e.g. for a car radio because of the high background noise level. To accommodate a wide variety of listening conditions, metadata instructing a receiver how to reduce the dynamic range of the reproduced audio can be inserted in the digital audio stream instead of reducing the dynamic range of the audio prior to transmission. The latter approach is not preferable as it makes it impossible for a receiver to reproduce the audio with full dynamic range. Instead, the former approach is preferred as it allows the listener to decide if dynamic range control shall be applied or not depending on the listening environment. Such dynamic range control metadata makes high-quality artistic dynamic range compression of a decoded signal available to listeners at their discretion.        (2) Prevent clipping in case of a downmix operation: When a multichannel signal (e.g. a 5.1-channel audio signal) is downmixed, the number of channels is reduced, typically to two channels. In case of reproducing a multichannel audio signal comprising more than two channels (e.g. a 5.1-channel audio signal having 5 main channels and 1 low frequency effect channel) via stereo speakers, typically a receiver side downmix operation is performed, where the multichannel signal is mixed into two channels. The mixing operation can be described by a downmix matrix, e.g. a 2.5 matrix having two rows and 5 columns in case of downmixing a 5-channel signal into a 2-channel (stereo) signal (the low frequency effect channel is typically not considered during downmix)                    Different downmix schemes for mixing the 5 main channels of a 5.1-channel signal into two channels are known, e.g. Lo/Ro (left only, right only) or Lt/Rt (left total, right total).            The downmix step bears the risk of occasional overload of the digital stereo signal, thereby generating undesired clipping artifacts. Such clipping may occur when the amplitude of a downmixed digital signal that would exceed the maximum (or minimum) representable value is limited to the maximum (or minimum) representable value. E.g. in case of a simple unsigned fixed point binary representation, clipping occurs when the computed downmixed amplitude is limited to the maximum value word where all bits correspond to 1. In case of a signed representation in 16 bit, the maximum value may e.g. correspond to the word “01111111 11111111”.            As the downmix matrices for the various downmixing schemes are known at the headend, sender or content generation side, for signals that may result in clipping when downmixed, dynamic range control metadata that instructs a receiver to attenuate the signals to-be downmixed prior to mixing can be added to the audio stream to dynamically prevent clipping.                        (3) Prevent clipping in case of boosted output: For retransmission over dynamically very limited channels (e.g. from a set-top-box via an analog RF link to the RF input of a TV), the signal is boosted, typically by 11 dB, to achieve a better signal-to-noise-ratio on this path. In such applications, for signals that may result in clipping when amplified by 11 dB, dynamic range control metadata that instructs a receiver to attenuate signals prior to applying the 11 dB amplification can be added to the audio stream to dynamically prevent clipping.        
From the perspective of the device receiving the audio stream, it is not clear if the incoming dynamic range control metadata serves the purpose under point (1), i.e. control of the dynamic range, the purpose under point (2), i.e. downmix clipping protection, or the purposes under both points (1) and (2). Often, the metadata accomplishes both tasks, but this is not always the case, so in some cases the metadata may not include downmix clipping protection. In addition, in case the metadata (typically, a different gain parameter is used for RF mode) is associated with the RF mode under point (3), the metadata may be used to prevent clipping in case of an extra amplification (both in case of downmixing and in case of not downmixing).
Moreover, the incoming audio stream may not include dynamic range control metadata at all, due to the fact that for some audio encoding formats the metadata is optional.
If the dynamic range control metadata is not included with the compressed audio stream or is included but does not include downmix clipping protection, undesirable clipping artifacts may be present in the decoded signal if a multi-channel signal is downmixed into to fewer channels.