Recently, a Spatial Audio Object Codec (SAOC) scheme is used to compress a multi-object audio signal. Generally, when the SAOC scheme is used, a plurality of input object signals may be compressed using only a spatial parameter of audio object signals that are input for each frequency band, and a sound scene may be generated. Accordingly, a sound scene where a volume is controlled for each object signal may be generated even at an extremely low bit rate. However, since the multi-object audio signal is compressed and restored using only a limited amount of bits, a sound quality of object signals may be inevitably degraded during encoding and decoding. In particular, in an environment where a specific object signal such as a vocal signal is completely removed or is independently played back, the sound quality may be seriously degraded. Accordingly, in the SAOC scheme, a range for controlling object signals is generally limited.
For example, when the SAOC scheme is used to encode and decode object signals that are desired to be controlled to an extreme level and that are, hereinafter, referred to as ForeGround Objects (FGOs) among a plurality of input object signals, and to extremely control the FGOs, the sound quality may be rapidly degraded. Here, FGOs may include vocal signals and thus, a karaoke service may be implemented using the vocal signals.
Accordingly, there is a desire for an audio signal encoding technology that may prevent a degradation in a sound quality even in an extremely controlled environment, while controlling a volume for each object signal, thereby providing listeners with a satisfactory sound quality.