The ability to correctly identify voiced and unvoiced speech is critical to many speech applications including speech recognition, speaker verification, noise suppression, and many others. In a typical acoustic application, speech from a human speaker is captured and transmitted to a receiver in a different location. In the speaker's environment there may exist one or more noise sources that pollute the speech signal, the signal of interest, with unwanted acoustic noise. This makes it difficult or impossible for the receiver, whether human or machine, to understand the user's speech.
Typical methods for classifying voiced and unvoiced speech have relied mainly on the acoustic content of single microphone data, which is plagued by problems with noise and the corresponding uncertainties in signal content. This is especially problematic with the proliferation of portable communication devices like mobile telephones. There are methods known in the art for suppressing the noise present in the speech signals, but these normally require a robust method of determining when speech is being produced. Non-acoustic methods have been employed successfully in commercial products such as the Jawbone headset produced by Aliphcom, Inc., San Francisco, Calif. (Aliph), but an acoustic-only solution is desired in some cases (e.g., for reduced cost, as a supplement to the non-acoustic sensor, etc.).