There exists a need to differentiate between normally phonated and whispered speech. To that end, literature searches have uncovered several articles on whispered speech detection. However, very little research has been conducted to classify or quantify whispered speech. Only two sources of work in this area are known and that work was conducted by Jovicic [1] and Wilson [2]. They observed that normally phonated and whispered speech exhibit differences in formant characteristics. These studies, in which Serbian and English vowels were used, show that there is an increase in formant frequency F1 for whispered speech for both male and female speakers. These studies also revealed a general expansion of formant bandwidths for whispered vowels as compared to voiced vowels. The results by Jovicic [1], which were computed using digitized speech data from five male and five female native Serbian speakers, show formant bandwidth increases over voice vowels for all five whispered vowels. However, the results by Wilson [2], which were computed using speech data from five male and five female Native American English speakers, show that the formant bandwidths are not consistently larger for whispered vowels. Therefore, developing a recognition process that solely relies on formant bandwidth would not appear to provide good results. In addition to the above work, Wilson [2] also showed that the amplitude for the first formant F1 was consistently lower in amplitude for whispered speech.
Although the results of this prior work clearly point out some differences between normally phonated and whispered speech, there has been no attempt to automatically distinguish between normally phonated and whispered speech.