As living quality improves, people impose increasing requirements on high-quality audio. Compared with a mono signal, stereo has a sense of direction and a sense of distribution for various acoustic sources, can improve clarity, intelligibility, and immersive experience of sound, and is therefore highly favored by people.
Stereo processing technologies mainly include mid/side (MS) encoding, intensity stereo (IS) encoding, and parametric stereo (PS) encoding.
In the MS encoding, MS conversion is performed on two signals based on inter-channel coherence (IC), and energy of channels is mainly focused on a mid channel such that inter-channel redundancy is eliminated. In the MS encoding technology, reduction of a code rate depends on coherence between input signals. When coherence between a left-channel signal and a right-channel signal is poor, the left-channel signal and the right-channel signal need to be transmitted separately.
In the IS encoding, high-frequency components of a left-channel signal and a right-channel signal are simplified based on a feature that a human auditory system is insensitive to a phase difference between high-frequency components (for example, components above 2 kilohertz (KHz)) of channels. However, the IS encoding technology is effective only for high-frequency components. If the IS encoding technology is extended to a low frequency, severe man-made noise is caused.
The PS encoding is an encoding scheme based on a binaural auditory model. As shown in FIG. 1 (in FIG. 1, xL is a left-channel time-domain signal, and xR is a right-channel time-domain signal), in a PS encoding process, an encoder side converts a stereo signal into a mono signal and a few spatial parameters (or spatial awareness parameters) that describe a spatial sound field. As shown in FIG. 2, after obtaining the mono signal and the spatial parameters, a decoder side restores a stereo signal with reference to the spatial parameters. Compared with the MS encoding, the PS encoding has a higher compression ratio. Therefore, in the PS encoding, a higher encoding gain can be obtained while relatively good sound quality is maintained. In addition, the PS encoding may be performed in full audio bandwidth, and can well restore a spatial awareness effect of stereo.
In the PS encoding, the spatial parameters include IC, an inter-channel level difference (ILD), an inter-channel time difference (ITD), and an inter-channel phase difference (IPD). The IC describes inter-channel cross correlation or coherence. This parameter determines awareness of a sound field range, and can improve a sense of space and sound stability of an audio signal. The ILD is used to distinguish a horizontal azimuth angle of a stereo acoustic source, and describes an inter-channel energy difference. This parameter affects frequency components of an entire spectrum. The ITD and the IPD are spatial parameters representing horizontal azimuth of an acoustic source, and describe inter-channel time and phase differences. The ILD, the ITD, and the IPD can determine awareness of a human ear to a location of an acoustic source, can be used to effectively determine a sound field location, and plays an important role in restoration of a stereo signal.
In a stereo recording process, due to impact of factors such as background noise, reverberation, and multi-party speech, an ITD calculated according to an existing PS encoding scheme is always unstable (an ITD value transits greatly). A downmixed signal calculated based on such an ITD is discontinuous. As a result, quality of stereo obtained on the decoder side is poor. For example, an acoustic image of the stereo played on the decoder side jitters frequently, and auditory freezing even occurs.