Modern audio coding systems do not only provide means to efficiently transmit audio content in a loudspeaker channel-based representation that is simply played back at the decoder side. They additionally include more advanced features to allow users to interact with the content and, thus, to influence how the audio is reproduced and rendered at the decoder. This allows for new types of user experiences compared to legacy audio coding systems.
An example for an advanced audio coding systems is the MPEG-H 3D Audio standard (J. Herre at al., “MPEG-H Audio—The New Standard for Universal Spatial/3D Audio Coding”, 137th AES Convention, 2014, Los Angeles). It allows a transmission of immersive audio content in three different formats, channel-based, object-based, and scene-based using higher order ambisonics (HOA). It has been designed to offer new capabilities such as user interaction for personalization and adaptation of the audio for different use scenarios.
The three different categories for content formats can be described as follows:                Channel-based: Traditionally, spatial audio content (starting from simple two channel stereo) has been delivered as a set of channel signals which are designated to be reproduced by loudspeakers in a precisely defined, fixed target location relative to the listener.                    Object-based: Audio objects are signals that are to be reproduced as to originate from a specific target location that is specified by associated side information provided as metadata along with the audio. In contrast to channel signals, the actual placement of audio objects can vary over time and is not necessarily pre-defined during the sound production process but by rendering it to the target loudspeaker setup at the time of reproduction. This may also include user interactivity on the location or the level of an object or groups of objects.            Higher Order Ambisonics (HOA) is an alternative approach to capture a 3D sound field by transmitting a number of ‘coefficient signals’ that have no direct relationship to channels or objects. The actual audio signals for reproduction are generated at the decoder taking into account the given loudspeaker configuration.                        
A method for loudness compensation in object-based audio coding systems including user interaction has been presented in EP 2 879 131 A1. A decoder receives an audio input signal comprising audio object signals and generates an audio output signal. A signal processor determines a loudness compensation value for the audio output signal based on loudness information associated with the audio input signal and based on rendering information. The rendering information indicates whether one or more of the audio object signals shall be amplified or attenuated and can be adjusted by a user's wish.