As an encoding technique for performing reproduction for high realistic sensation superior to the 5.1-channel surround reproduction in the related art or transferring a plurality of audio elements (objects), the 3D audio standard has been generally used (for example, refer to NPL 1 to 3).
In the 3D audio standard, the minimum value of the size of the buffer for storing the input bit stream to be provided to a decoder is defined as a minimum decoder input buffer size. For example, in the section 4.5.3.1 in NPL 3, the minimum decoder input buffer size is defined to be equal to 6144*NCC (bits).
Here, NCC is an abbreviation of Number of Considered Channel, and indicates the sum between twice the number of channel pair elements (CPEs) and the number of single channel elements (SCEs), in all the audio elements included in the input bit stream.
Further, SCE is an audio element in which an audio signal of one channel is stored, and CPE is an audio element in which an audio signal of two channels set as a pair is stored. Consequently, for example, the number of SCEs included in an input bit stream may be 5, and the number of CPEs may be 3. In this case, NCC=5+2*3=11.
As described above, in the 3D audio standard, when the decoder is intended to decode the input bit stream, it is necessary to ensure the minimum buffer with the defined size.