1. Field of the Invention
The present invention relates to a method and apparatus for detecting voice presence/absence state, and a method and apparatus for encoding a voice signal which include the method and apparatus for detecting voice presence/absence state, respectively. The method and apparatus for encoding a voice signal are used in a portable telephone and an automobile telephone for example.
2. Description of the Prior Art
A background noise generating system has been disclosed in for example JPA 7-336290 titled xe2x80x9cVOX Controlled Communication Apparatus (translated title)xe2x80x9d. Next, with reference to FIGS. 1 and 2, the related art reference will be described in brief.
FIG. 1 is a block diagram showing the structure of the apparatus according to the related art reference. FIG. 2 is a flow chart showing the operation of the apparatus according to the related art reference.
As shown in FIG. 1, the apparatus according to the related art reference comprises a voice signal input terminal 610, a frame dividing portion 620, a voice presence state detecting portion 630, a controlling portion 640, a highly efficient voice encoding portion 650, a switch 660, and an encoded signal output terminal 670. The voice presence state detecting portion 630 comprises a frame energy calculating portion 631 and a voice presence/absence state determining portion 632.
Next, the overall operation of the apparatus according to the related art reference will be described in brief.
The frame dividing portion 620 receives a voice signal from the voice signal input terminal 610 (at step B1). The frame dividing portion 620 divides the voice signal into frames (with a period of 20 msec each). The frames are supplied to the voice presence state detecting portion 630 and the highly efficient voice encoding portion 650 (at step B2).
The frame energy calculating portion 631 calculates the intensity of energy of each frame of the voice signal and supplies the calculated data to the voice presence/absence state determining portion 632 (at step B3).
The voice presence/absence state determining portion 632 determines whether or not the intensity of energy of each frame received from the frame energy calculating portion 631 is larger than a predetermined threshold value. When the intensity of energy of the current frame is larger than the predetermined threshold value, the voice presence/absence state determining portion 632 determines that the current frame is a voice frame. When the intensity of energy of the current frame is not larger than the predetermined threshold value, the voice presence/absence state determining portion 632 determines that the current frame is a non-voice frame. The voice presence/absence state determining portion 632 supplies the determined result to the controlling portion 640 (at step B4).
The controlling portion 640 controls the highly efficient voice encoding portion 650 and the switch 660 corresponding to the determined result received from the voice presence/absence state determining portion 632 (at step B5).
In another related art reference as JPA 9-152894 titled xe2x80x9cVoice presence/absence state determining apparatus (translated title)xe2x80x9d, an apparatus that accurately determines whether or not each frame is a voice frame including the beginning portion of a phonation is disclosed. In the apparatus according to this related art reference, a sub-frame power calculating portion calculates the power of each of four sub-frames into which each frame is divided. A frame maximum power generating portion calculates the average value of the power of each sub-frame and the moving average of the power between adjoining two sub-frames, compares the moving average values of any sub-frames in the same frame, and selects the maximum moving average as the maximum power of the frame. Thus, even if a phonation starts from a later portion of a frame, the frame maximum power is prevented from being underestimated. Consequently, a voice presence state determining portion can securely determine that the current frame is a voice frame.
However, the related art references have the following disadvantages.
As a first disadvantage, if the voice presence/absence state changes in the middle of each frame, the frame cannot be accurately determined as a voice frame.
This is because the intensity of energy of a voice signal which will be a determination factor for the voice presence/absence state is calculated for each frame as the voice process.
As a second disadvantage, a frame that partly contains pulse noise may be determined as a voice frame.
This is because when the intensity of energy of the pulse noise is too large, the intensity of energy of the entire frame becomes larger than the voice presence/absence determination threshold value. Thus, the frame is determined as a voice frame.
In order to overcome the aforementioned disadvantages, the present invention has been made and accordingly, has an to provide a method and apparatus for accurately determining whether or not each frame is a voice frame even if a voice presence/absence state changes in the middle of the frame and even if each frame partly contains pulse noise.
According to a first aspect of the present invention, there is provided a method for detecting a voice presence/absence state of a frame which is obtained by dividing a voice signal into frames, comprising steps of: dividing the frame into sub-frames; calculating a physical amount of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of a degree of variation of the physical amount among the sub-frames.
According to a second aspect of the present invention, there is provided a method for detecting a voice presence/absence state of a frame which is obtained by dividing a voice signal into frames, comprising steps of: dividing the frame into sub-frames; calculating a periodicity of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of the periodicity of the voice signal in each sub-frame.
According to a third aspect of the present invention, there is provided a method for encoding a voice signal, comprising steps of: dividing a voice signal into frames: detecting a voice presence/absence state of each frame; encoding the voice signal for each frame; and determining whether to output the encoded voice signal for each frame; wherein the steps of encoding and determination are controlled by a result of the step of detection; and wherein the step of detection comprises steps of: dividing the frame into sub-frames; calculating a physical amount of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of a degree of variation of the physical amount among the sub-frames.
According to a fourth aspect of the present invention, there is provided a method for encoding a voice signal, comprising steps of: dividing a voice signal into frames: detecting a voice presence/absence state of each frame; encoding the voice signal for each frame; and determining whether to output the encoded voice signal for each frame; wherein the steps of encoding and determination are controlled by a result of the step of detection; and wherein the step of detection comprises steps of: dividing the frame into sub-frames; calculating a periodicity of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of the periodicity of the voice signal in each sub-frame.
These and other objects, features and advantages of the present invention will become more apparent in light of the following detailed description of a best mode embodiment thereof, as illustrated in the accompanying drawings.