1. Field
The present invention relates to a voice/music determining apparatus and method for quantitatively determining proportions of a voice signal and a musical signal that are contained in an audio (audible frequency) signal to be played back.
2. Description of Related Art
As is well known, sound quality correction processing is often used for increasing sound quality in an equipment, such as a broadcast receiver for TV broadcasts, or an information playing-back equipment for playing back recorded information on an information recording media, in reproducing an audio signal such as a received broadcast signal, and a signal read from an information recording medium.
In this case, what is performed in the sound quality correction processing on the audio signal differs, depending on whether the audio signal is a voice signal of a human voice or a musical (non-voice) signal, such as a music tune. More specifically, as for a voice signal, the sound quality correction processing should be performed so as to emphasize and clarify center-located components as in the case of a talk scene, a sport running commentary, etc. As for a musical signal, the sound quality correction processing should be performed so as to emphasize a stereophonic sense and provide necessary extensity.
To this end, in current equipment, it is determined whether an acquired audio signal is a voice signal or a musical signal so that a suitable sound quality correction is performed according to such a determination result. However, an actual audio signal in many cases contains a voice signal and a musical signal in mixture and it is difficult to make discrimination between them. At present, it does not appear that proper sound quality correction processing is necessarily performed on audio signals.
JP-A-7-13586 discloses a configuration in which an input acoustic signal is determined as a voice if its consonant nature, voicelessness, and power variation are higher than given threshold values. The input acoustic signal is determined as music if its voicelessness and power variation are lower than the given threshold values, and is determined as indefinite in otherwise cases.