In many applications, it is desirable to minimize the amount of information needed to represent signals or files. By minimizing the amount of information, bandwidth needed to transmit the signal and/or storage space needed to store the file can be conserved. This can be particularly useful for devices or systems having limited resources, such as mobile communication devices.
One type of signal, which is typically compressed using an encoder is an audio signal. Audio encoders can be used to compress a time domain audio signal such that the bit rate needed to represent the signal is significantly reduced. Ideally, the bitrate of the encoded signal is reduced such that it fits the constraints of a transmission channel used to transmit the signal. This can be particularly useful for real-time communication and streaming services application. The size of an file representing the encoded audio signal can also be reduced using compression. This can be particularly useful for downloading and/or storing high quality audio content. Typically an audio encoder aims to minimize the perceptual distortion at any given bitrate or compressed file size. However, the lower the bitrate or the more compression applied to a file, the more challenging it is to the encoder to satisfy these two conditions. Typically it is the (encoding) performance with the worst-case signals (signals that are difficult to encode) that ultimately defines the overall performance of any encoding system. Another factor in defining the overall performance of any encoding system is the encoding speed and resources needed to encode the signal.
Many encoding techniques and encoders currently exist, however one problem with existing techniques and encoders is that they are slow. Another problem that is often encountered with existing techniques is that they require an extraordinary amount of resources such as memory. While this may not be a problem in research conditions, for commercial use and especially for mobile use, encoding speed and resource requirements can become important considerations.
Advanced Audio Coding (AAC) is an example of one audio encoding system which can be used to generate high quality audio files. AAC, the successor to MP3, is a wideband audio coding algorithm that is can be used for generating high quality audio files. AAC exploits two coding strategies to reduce the amount of data needed to convey high-quality digital audio. The signal components that can not be perceived are removed and redundancies in the encoded signal are eliminated. AAC generally supports two frequency resolutions, 128-point and 1024-point modified discrete cosine transform (MDCT). The former can be used for efficient handling of transient signal segments and the latter can be used when (quasi)-stationary signal segments are present to achieve high energy compaction.
AAC offers an extensive set of encoding tools which can be used to attempt to maximize the subjective audio quality under various encoding conditions. AAC operates using profiles which can define a subset of tools that can be used for encoding a signal.
One such profile, AAC Long-Term Prediction (LTP), can be used for modeling tonal signal segments and can provide a significant quality improvement in encoding worst-case signal segments. However, similar to other existing encoding techniques, AAC LTP encoders can suffer from very slow encoding speeds. One reason may be that an estimation of LTP lag information is performed which can require a significant amount of computation.
An AAC LTP encoder can be configured so that LTP models long-term correlations by repeating past reconstructed signal segments. One sample transfer function used for LTP can be:B(z)=bLTP·z−M  (1)where bLTP is the LTP predictor coefficient, and M is the predictor delay, usually referred to as the pitch lag. The predictor parameters (LTP coefficient and lag) can be determined by minimizing the mean squared error function. One way of defining the mean squared error function can be:
                    E        =                              ∑                          i              =              0                                      N              -              1                                ⁢                                    [                                                x                  ⁡                                      (                    i                    )                                                  -                                                      b                    LTP                                    ·                                                            x                      ~                                        ⁡                                          (                                              i                        -                        M                                            )                                                                                  ]                        2                                              (        2        )            where N is the frame size (in the time domain), x is the input signal segment and {tilde over (x)} is the past reconstructed signal.
A preferred, optimum LTP predictor coefficient may be calculated as:bLTP=r/a  (3)where
                              a          =                                    ∑                              i                =                0                                            N                -                1                                      ⁢                                                            x                  ~                                ⁡                                  (                                      i                    +                    M                                    )                                            ·                                                x                  ~                                ⁡                                  (                                      i                    +                    M                                    )                                                                    ⁢                                  ⁢                  r          =                                    ∑                              i                =                0                                            N                -                1                                      ⁢                                          x                ⁡                                  (                  i                  )                                            ·                                                x                  ~                                ⁡                                  (                                      i                    -                    M                                    )                                                                                        (        4        )            
The LTP lag can be determined by maximizing the normalized cross-correlation between x and {tilde over (x)} over the specified lag range as follows:
                                          M            =                          max              ⁢                                                          ⁢                              {                                  C                  ⁡                                      (                    τ                    )                                                  }                                              ,                      0            ≤            τ            <                          N              -              1                                      ⁢                                  ⁢                              C            ⁡                          (              τ              )                                =                      {                                                            ∑                                      i                    =                    0                                                        N                    -                    1                                                  ⁢                                                      x                    ⁡                                          (                      i                      )                                                        ·                                                            x                      ~                                        ⁡                                          (                                              i                        -                        τ                                            )                                                                                                                                        ∑                                          i                      =                      0                                                              N                      -                      1                                                        ⁢                                                                                    x                        ~                                            ⁡                                              (                                                  i                          -                          τ                                                )                                                              2                                                                        }                                              (        5        )            
After the LTP lag has been determined, the predicted time domain signal can be calculated using the sample transfer function. Then, the predicted time domain signal can be converted to a frequency domain representation for the residual signal computation. In AAC, this time-to-frequency (t/f) transformation is normally a 1024-point modified discrete cosine transform (MDCT). In order to maximize the prediction gain, the difference signal can be obtained on a frequency band basis. If predictable components are present within the band, the difference signal can be used; otherwise that band can be left unmodified. This control can be implemented as a set of flags, which are transmitted in the bitstream along with the other predictor parameters.
As mentioned above, encoding methods, such as the one described above, tend to be slow or require an impractical amount of resources. This can be a particular in certain applications such as mobile communication devices where encoding speed and resource requirement can be particularly important issues. As such, there is a need for improved systems, methods, devices, and computer code products for encoding an audio signal which can reduce the encoding time and resources while still maintaining a high quality audio signal.