In stereophonic encoding, left and right channel signals are not encoded directly; instead, left and right channel signals are downmixed firstly and the downmixed signals are encoded. Then, some additional sideband information is encoded. Stereophonic signals are restored at the decoding end by using the downmixed signals and the sideband information. In general, there is a distance variation or distance difference between a sound generator and two microphones recording the left channel and the right channel. Therefore, the left channel signal is not completely synchronous with the right channel signal, that is, there is a certain delay between the left channel signal and the right channel signal. It is necessary to estimate the delay correctly and restore the delay at the decoding end to guarantee the sound intensity of a synthesized signal.
Currently, when an interchannel delay is estimated, a weighted cross-correlation function between the left channel and the right channel is calculated; a delay corresponding to a maximum value of the weighted cross-correlation function is found and used as the delay between the left channel and the right channel. For a single sound generator, because it has a single left channel and a single right channel and the locations of the left channel and right channel are fixed relative to the two microphones recording the left channel and the right channel, a relatively accurate interchannel delay may be estimated by using the above method.
For multiple sound generators, that is, a crosstalk, because there are multiple left channels and multiple right channels, the sound field swings in the left direction or in the right direction from time to time, and the right sound field swings to the left while the left channel swings to the right. As a result, it is difficult to determine which left channel and right channel are produced from a same sound generator. If the interchannel delay in the crosstalk is estimated by using the above method, the estimated inter-channel delay is inaccurate, which causes an unstable estimated sound field.