Generally, when an encoder encodes an audio signal, in case that the audio signal to be encoded is a multi-channel audio signal, the multi-channel audio signal is downmixed into two channels or one channel to generate a downmix audio signal and spatial information is extracted from the multi-channel audio signal. The spatial information is the information usable in upmixing the multi-channel audio signal from the downmix audio signal. Meanwhile, the encoder downmixes a multi-channel audio signal according to a predetermined tree configuration. In this case, the predetermined tree configuration can be the structure(s) agreed between an audio signal decoder and an audio signal encoder. In particular, if identification information indicating a type of one of the predetermined tree configurations is present, the decoder is able to know a structure of the audio signal having been upmixed, e.g., a number of channels, a position of each of the channels, etc.
Thus, if an encoder downmixes a multi-channel audio signal according to a predetermined tree configuration, spatial information extracted in this process is dependent on the structure as well. So, in case that a decoder upmixes the downmix audio signal using the spatial information dependent on the structure, a multi-channel audio signal according to the structure is generated. Namely, in case that the decoder uses the spatial information generated by the encoder as it is, upmixing is performed according to the structure agreed between the encoder and the decoder only. So, it is unable to generate an output-channel audio signal failing to follow the agreed structure. For instance, it is unable to upmix a signal into an audio signal having a channel number different (smaller or greater) from a number of channels decided according to the agreed structure.