1. Field
One embodiment of the invention relates to a voice/music determining apparatus, a voice/music determination method, and a voice/music determination program that quantitatively determine the ratio of a voice signal to a music signal included in an audio (audio frequency) signal to be reproduced.
2. Description of the Related Art
As is well known, in broadcast receivers that receive television broadcasts or information reproducers that reproduce recorded information from an information recording medium containing the recorded information, for example, when an audio signal is reproduced from a received broadcast signal or a signal read from an information recording medium, a sound quality correction process is performed on the audio signal to achieve higher sound quality.
In this case, the content of a sound quality correction process performed on the audio signal varies depending on whether the audio signal is a voice signal such as a person's speaking voice or a music (non-voice) signal such as a musical piece. Specifically, the voice signal needs to be subjected to a sound quality correction process to achieve clarity by emphasizing a center-localized component such as a talk scene or sports commentary, and the music signal needs to be subjected to a sound quality correction process to obtain expanded, emphasized stereo sound.
Hence, the present apparatuses determine whether an obtained audio signal is a voice signal or a music signal and perform an appropriate sound quality correction process on the audio signal, according to the determination result. However, an actual audio signal often includes both a voice signal and a music signal and thus a determination process thereof is difficult. Accordingly, in the present situation it cannot be said that an appropriate sound quality correction process is performed on an audio signal.
Jpn. Pat. Appln. KOKAI Publication No. 7-13586 discloses a configuration in which when the “consonance”, “silence”, and “power fluctuations” of an input acoustic signal are higher than their respective predetermined threshold values, the signal is determined to be voice, and when the “silence” and the “power fluctuations” of an input acoustic signal are lower than their respective predetermined threshold values, the signal is determined to be music, and otherwise determined to be indeterminate.