1. Field of the Invention
The present general inventive concept generally relates to an audio processing system, and more particularly, to a robust method and apparatus to detect voice activity based on the power of an audio frame.
2. Description of the Related Art
Conventionally, voice activity extraction in voice coding uses voice activity detection (VAD) or end point detection (EPD).
A conventional voice activity detection method detects voice activity or start and end points of voice using the energy of each frame and the zero-crossing rate of the frame. For example, a period with speech (an active voice period) and a period without speech (a non-active voice period) are determined for each frame according to the zero-crossing rate of the frame.
When the active voice period and the non-active voice period are determined using the zero-crossing rate, noise may exist in the non-active voice period, and thus zero-crossing rates in the active voice period and the non-active voice period may not be equal at all times.
In other words, active/non-active voice period determination using the zero-crossing rate may involve noise having a zero-crossing rate that is similar to that of speech, as well as the speech as the active voice period. As a result, conventional active/non-active voice period determination using the zero-crossing rate may have errors because a zero-crossing rate may also occur in the non-active voice period.
Moreover, active/non-active voice period determination using the energy of a frame has difficulties in determining the active-voice period or the non-active voice period when using a fixed threshold when signals of different levels are input.