The ability to correctly identify voiced and unvoiced speech is critical to many speech applications including speech recognition, speaker verification, noise suppression, and many others. In a typical acoustic application, speech from a human speaker is captured and transmitted to a receiver in a different location. In the speaker's environment there may exist one or more noise sources that pollute the speech Signal, the signal of interest, with unwanted
acoustic noise. This makes it difficult or impossible for the receiver, whether human or machine, to understand the user's speech. Typical methods for classifying voiced and unvoiced speech have relied mainly on the acoustic content of single microphone data, which is plagued by problems with noise and the corresponding uncertainties in signal content. This is especially problematic with the proliferation of portable communication devices like mobile telephones.
There are methods known in the art for suppressing the noise present in the speech signals, but these generally require a robust method of determining when speech is being produced.