In a typical method of encoding a multi-channel audio signal, a multi-channel audio signal is downmixed into a mono or stereo signal and the mono or stereo signal is encoded, instead of encoding each channel of the multi-channel audio signal. In this method, a multi-channel audio signal is encoded together with spatial information indicating spatial cues.
FIG. 1 is a diagram for illustrating a bitstream of a multi-channel audio signal generated using a typical method of encoding a multi-channel audio signal. Referring to FIG. 1, a bitstream of a multi-channel audio signal is divided into one or more frames (i.e., frames 1 through 3), and is thus transmitted or decoded in units of the frames. A header is placed ahead of frame 1. The header includes Spatial Audio Coding (SAC) configuration information, and each of frames 1 through 3 includes spatial information of a corresponding frame. The SAC configuration information comprises information that can be commonly applied to frames 1 through 3, i.e., sampling frequency information, frame length information, and tree configuration information specifying a downmix combination of a multi-channel signal.
Conventionally, SAC configuration information is included only in the header of a bitstream. Thus, when the header of a bitstream of a multi-channel audio signal is not received as in a streaming service, information needed to decode the bitstream cannot be obtained.
In addition, since tree configuration information is included only in SAC configuration information, the same downmix combination must be used throughout an entire multi-channel audio signal. Accordingly, it is impossible to perform decoding such that a downmix combination can vary from one frame to another of a multi-channel audio signal obtained by the decoding. Also, it is impossible to perform encoding/decoding such that each frame of a multi-channel audio signal can be encoded/decoded with optimum efficiency.