With the development of the computer technology and the digital signal processing technology and the requirements for developing high-definition television sound systems and home audio-visual systems, the stereo technology is developed greatly, and definitely, this also raises higher requirements for the stereo technology especially for the encoding/decoding technology.
The common stereo coding method is the parametric stereo coding method. In the parametric stereo coding method, signals of left and right sound channels are usually not coded directly, instead, the signals of the left and right sound channels are downmixed to obtain a downmix signal, and the downmix signal is coded. Some extra sideband information is added during the coding. At a decoding end, stereo signals can be restored through the downmix signal and the sideband information. Estimation of the quality of the stereo signal depends on the quality of the downmix signal to a great extent. That is, at a coding end, the more synchronous the signals of the left and right sound channels, the less the information is lost in the downmixing process. However, in general circumstances, a sound-producing object may have distance change or distance difference relative to two microphones that are used for recording left and right sound channels, which may definitely result in a problem that the signals of the left and right sound channels cannot be completely synchronous, that is, a certain delay may exist between the signals of the left and right sound channels. To keep the signals of the left and right sound channels synchronous, a method for estimating delay is put forward, so as to improve the quality of the stereo synthetic signal.
Currently, the method for estimating delay in the prior art includes: before signals of left and right sound channels are generated into a downmix signal, obtaining a cumulative cross-correlation function of the signals of the left and right sound channels, taking a time corresponding to a maximum value in the cumulative cross-correlation function as a delay between the signals of the left and right sound channels, coding the delay, and sending the coded delay to a decoding end, so as to perform signal synthesis according to the delay at the decoding end, thereby maintaining stability of the sound field of the signals of the left and right sound channels. In actual applications, to maintain the delay between the left and right sound channels stable, the cumulative cross-correlation function is usually taken as a decision basis. For the sake of convenience, it is agreed that when the left sound channel is previous to the right sound channel, the delay is positive; otherwise, the delay is negative.
However, in the above method, when the sound field of the signals of the left and right sound channels changes, for example, when the sound field is converted from one direction to another direction, the positive and negative properties of the estimated delay change, but the prior art cannot well track such a change of the sound field, that is, when the sound field changes, the cumulative cross-correlation function cannot sense the change, so wrong delay estimation may be caused, and when the decoding end performs signal synthesis according to the wrong delay, the sound field of the signal may be instable.
In view of the above, during the research and practice for the prior art, the inventors of the present invention find that, in the existing implementation modes, when the sound field of the signals of the left and right sound channels changes, such a change of the sound field cannot be tracked well, and therefore, the delay between the left and right sound channels cannot be estimated correctly, thereby causing the synthetic stereo instability, reducing the stereo coding quality, and influencing the sound effect.