The typical structure of speech is Vowel-Consonant-Vowel (VCV) or Consonant-Vowel-Consonant (CVC). All vowels are produced by voiced sounds, although many consonants are produced with nonvoiced or voiceless (VL) sounds. The energy peaks in voiced sounds are predominantly in lower frequencies below 3 KHz. In voiceless sounds the energy peaks are predominantly in higher frequencies above 3 KHz. There is typically more energy in voiced sounds than in voiceless sounds.
One known method to discriminate voiced from voiceless sounds is to analyze the zero-crossing frequency of speech. However this method itself cannot provide reliable detection in noisy environments. Also this method does not work well for females and children who have higher pitched voices.
For example some vowels, such as /i/, /ea/ and /e/, have higher energy peaks (second and third formats) and may generate high zero crossing frequencies. Table 1. shows an average of the first and second formants of such American vowels for male, female and child voices:
TABLE 1 Vowel heat hit when pay 1st Formant Male 270 390 530 660 Female 310 430 610 860 Child 370 530 690 1010 2nd Formant Male 2290 1990 1840 1720 Female 2790 2480 2330 2050 Child 3200 2730 2610 2320
In the presence of noise (typically in lower frequencies), the zero crossing of voiceless consonants may be "pulled" down to lower frequencies.