Parametric Stereo (PS) is one of the major advances in audio coding of the last couple of years. The basics of Parametric Stereo are explained in J. Breebaart, S. van de Par, A. Kohlrausch and E. Schuijers, “Parametric Coding of Stereo Audio”, in EURASIP J. Appl. Signal Process., vol 9, pp. 1305-1322 (2004). Compared to traditional, a so-called discrete coding of audio signals, the PS encoder as depicted in FIG. 1 transforms a stereo signal pair (l, r) 101, 102 into a single mono downmix signal 104 plus a small amount of parameters 103 describing the spatial image. These parameters comprise Interchannel Intensity Differences (iids), Interchannel Phase (or Time) Differences (ipds/itds) and Interchannel Coherence/Correlation (iccs). In the PS encoder 100 the spatial image of the stereo input signal (l, r) is analyzed resulting in iid, ipd and icc parameters. Preferably, the parameters are time and frequency dependent. For each time/frequency tile the iid, ipd and icc parameters are determined. These parameters are quantized and encoded 140 resulting in the PS bit-stream. Furthermore, the parameters are typically also used to control how the downmix of the stereo input signal is generated. The resulting mono sum signal (s) 104 is subsequently encoded using a legacy mono audio encoder 120. Finally the resulting mono and PS bit-stream are merged to construct the overall stereo bit-stream 107.
In the PS decoder 200 the stereo bit-stream is split into a mono bit-stream 202 and PS bit-stream 203. The mono audio signal is decoded resulting in a reconstruction of the mono downmix signal 204. The mono downmix signal is fed to the PS upmix 230 together with the decoded spatial image parameters 205. The PS upmix then generates the output stereo signal pair (l, r) 206, 207. In order to synthesize the icc cues, the PS upmix employs a so-called decorrelated signal (sd), i.e., a signal is generated from the mono audio signal that has roughly the same spectral and temporal envelope, that however has a correlation of substantially zero with regard to the mono input signal. Then, based on the spatial image parameters, within the PS upmix for each time/frequency tile a 2×2 matrix is determined and applied:
            [                                    l                                                r                              ]        =                  [                                                            H                11                                                                    H                12                                                                                        H                21                                                                    H                22                                                    ]            ⁡              [                                            s                                                                          s                d                                                    ]              ,where Hij represents an (i, j) upmix matrix H entry. The H matrix entries are functions of the PS parameters iid, icc and optionally ipd/opd. In the state-of-the-art PS system in case ipd/opd parameters are employed, the upmix matrix H can be decomposed as:
            [                                    l                                                r                              ]        =                            [                                                                      ⅇ                                      j                    ⁢                                                                                  ⁢                                          φ                      1                                                                                                  0                                                                    0                                                              ⅇ                                      j                    ⁢                                                                                  ⁢                                          φ                      2                                                                                                    ]                ⁡                  [                                                                      h                  11                                                                              h                  12                                                                                                      h                  21                                                                              h                  22                                                              ]                    ⁡              [                                            s                                                                          s                d                                                    ]              ,where the left 2×2 matrix represents the phase rotations, a function of the ipd and opd parameters, and the right 2×2 matrix represents the part that reinstates the iid and icc parameters.
In WO2003090206 A1 it is proposed to equally distribute the ipd over the left and right channels in the decoder. Furthermore, it is proposed to generate a downmix signal by rotating the left and right signals both towards each other by half the measured ipd to obtain alignment. In practice, in case of nearly out of phase signals, this results for, both, the downmix generated in the encoder as well as the upmix generated in the decoder that the ipd over time varies slightly around 180 degrees, which due to wrapping may consist of a sequence of angles such as 179, 178, −179, 177, −179, . . . . As result of these jumps subsequent time/frequency tiles in the downmix exhibits phase discontinuities or in other words phase instability. Due to the inherent overlap-add synthesis structure this results in audible artefacts.
As an example, consider the downmix where in the one time/frequency tile the downmix is generated as:s=lej(π/2−ε)+rej(−π/2+ε),where ε is some arbitrary small angle, meaning that the ipd measured was close to 180 degrees, whereas for the next time-frequency tile the downmix is generated as:s=lej(−π/2+ε)+rej(π/2−ε),meaning that the measured ipd was close to −180 degrees. Using typical overlap-add synthesis a phase cancellation will occur in between the midpoints of the subsequent time/frequency tiles yielding artefacts.
A major disadvantage of the parametric stereo coding as discussed above is instability of a synthesis of the Interaural Phase Difference (ipd) cues in the PS decoder which are used in generating the output stereo pair. This instability has its source in phase modifications performed in the PS encoder in order to generate the downmix, and in the PS decoder in order to generate the output signal. As a result of this instability a lower audio quality of the output stereo pair is experienced.
In order to deal with this phase instability problem in practice the ipd synthesis is often discarded. However, this results in a reduced (spatial) audio quality of the reconstructed stereo signal.
Another alternative of dealing with this instability problem when ipd parameters are used is to incorporate so-called Overall Phase Differences (opds) in the bitstream in order to provide the decoder with a phase reference. In this way the continuity over time/frequency tiles can be increased by allowing for a common phase rotation. This however happens at the expense of an increase of bitrate, and thus results in deterioration of the overall system performance.