1. Field of the Invention
The present invention relates generally to music detection. More particularly, the present invention relates to low-complexity pitch correlation calculation for use in music detection.
2. Background Art
In various speech coding systems it is useful to be able to detect the presence or absence of music, in addition to detecting voice and background noise. For example a music signal can be coded in a manner different from voice or background noise signals.
Speech coding schemes of the past and present often operate on data transmission media having limited available bandwidth. These conventional systems commonly seek to minimize data transmission while simultaneously maintaining a high perceptual quality of speech signals. Conventional speech coding methods do not address the problems associated with efficiently generating a high perceptual quality for speech signals having a substantially music-like signal. In other words, existing music detection algorithms are typically either overly complex and consume an undesirable amount of processing power, or are poor in ability to accurately classify music signals.
Further, conventional speech coding systems often employ voice activity detectors (“VADs”) that examine a speech signal and differentiate between voice and background noise. However, conventional VADs often cannot differentiate music from background noise. As is known in the art, background noise signals are typically fairly stable as compared to voice signals. The frequency spectrum of voice signals (or unvoiced signals) changes rapidly. In contrast to voice signals, background noise signals exhibit the same or similar frequency for a relatively long period of time, and therefore exhibit heightened stability. Therefore, in conventional approaches, differentiating between voice signals and background noise signals is fairly simple and is based on signal stability. Unfortunately, music signals are also typically relatively stable for a number of frames (e.g. several hundred frames). For this reason, conventional VADs often fail to differentiate between background noise signals and music signals, and exhibit rapidly fluctuating outputs for music signals.
If a conventional VAD considers a speech signal not to represent voice, the conventional system will often simply classify the speech signal as background noise and employ low bit rate encoding. However, the speech signal may in fact comprise music and not background noise. Employing low bit rate encoding to encode a music signal can result in a low perceptual quality of the speech signal, or in this case, poor quality music.
Although previous attempts have been made to detect music and differentiate music from voice and background noise, these attempts have often proven to be inefficient, requiring complex algorithms and consuming a vast amount of processing resources and time.
Furthermore, although some music detection systems have reduced complexity and processing bandwidth by utilizing certain parameters that have already been calculated by the speech coding components, such as pitch gain, pitch correlation, energy, LPC gain, etc., in standalone music detection systems, such parameters are not available. Therefore, standalone music detection systems must perform complex and time consuming operations to derive such parameters in order to distinguish music from background noise
Thus, it is seen that there is need in the art for an improved algorithm and system for differentiating music from background noise with high accuracy but relatively low-complexity to perform music detection using minimal processing time and resources.