It is well known that a music signal is efficiently encoded in a frequency domain and a speech signal is efficiently encoded in a time domain. Therefore, various techniques of classifying whether an audio signal in which a music signal and a speech signal are mixed corresponds to the music signal or the speech signal and determining a coding mode in response to a result of the classification have been proposed.
However, frequent switching of coding modes induces the occurrence of a delay and deterioration of the quality of a restored sound, and a technique of correcting an initial classification result has not been proposed, and thus when there is an error in an initial signal classification, the deterioration of restored sound quality occurs.