All contemporary perceptual audio codecs, including MP3, Opus (Celt), the HE-AAC family, and the new MPEG-H 3D Audio and 3GPP Enhanced Voice Services (EVS) codecs, employ the MDCT for spectral-domain quantization and coding of one or more channel waveforms. The synthesis version of this lapped transform, using a length-M spectrum spec[ ] is given by
                              x                      i            ,            n                          =                  C          ⁢                                    ∑                              k                =                0                                            M                -                1                                      ⁢                                                            spec                  ⁡                                      [                    i                    ]                                                  ⁡                                  [                  k                  ]                                            ⁢                                                          ⁢                              cos                ⁡                                  (                                                                                    2                        ⁢                                                                                                  ⁢                        π                                            N                                        ⁢                                          (                                              n                        +                                                  n                          0                                                                    )                                        ⁢                                          (                                              k                        +                                                  1                          2                                                                    )                                                        )                                                                                        (        1        )            with M=N/2 and N being the time-window length. After windowing, the time output xi,n is combined with the previous time output xi-1,n by way of an overlap-and-add (OLA) process. C may be a constant parameter being greater than 0 or less than or equal to 1, such as e.g. 2/N.
While the MDCT of (1) works well for high-quality audio coding of arbitrarily many channels at various bitrates, there are two cases in which the coding quality may fall short. These are e.g.                highly harmonic signals with certain fundamental frequencies which are, via MDCT, sampled such that each harmonic is represented by more than one MDCT bin. This leads to suboptimal energy compaction in the spectral domain, i.e. low coding gain.        stereo signals with roughly 90 degrees of phase shift between the channels' MDCT bins, which can't be exploited by traditional M/S-stereo based joint channel coding. More sophisticated stereo coding involving coding of inter-channel phase difference (IPD) can be achieved e.g. using HE-AAC's Parametric Stereo or MPEG Surround, but such tools operate in a separate filter bank domain, which increases complexity.        
Several scientific papers and articles mention MDCT or MDST-like operations, sometimes with different naming such as “lapped orthogonal transform (LOT)”, “extended lapped transform (ELT)” or “modulated lapped transform (MLT)”. Only [4] mentions several different lapped transforms at the same time, but does not overcome the aforementioned drawbacks of the MDCT.
Therefore, there is a need for an improved approach.