1. Technical Field
The present invention relates in general to a method and apparatus for speech recognition, and in particular to a method and apparatus for sibilant classification of speech. Still more particularly, the present invention relates to a method and apparatus for sibilant classification of speech in a speech recognition system that is speaker independent.
2. Description of the Related Art
Human speech sounds originate in two different ways. They originate as either sonorant sounds or fricatives. Sonorant or "voiced" sounds are generated by the vocal chords as harmonic-rich periodic pressure waves. These pressure waves are then filtered by a number of resonant cavities in the upper respiratory tract. A speaker uses muscles in the throat and mouth to alter the resonant frequencies of these cavities and thereby form various vowel sounds. Fricatives, also called, sibilants, are the brief hissing sounds associated with pronouncing "S", "SH", "F", and "H" sounds. Basically, sibilant sounds result from turbulent flow that occurs when the speaker's breath is passed through a constriction. For example, the "H" sound is caused by a constriction between the tongue and palate. These aperiodic noises are filtered by small resonant cavities formed by the tongue, palate, teeth and lips. The filtering by the small resonant cavities enhances certain bands of frequencies within the noise to impart a noticeable coloration. Variations on this effect allow for differentiation of sibilant sounds.
Distinguishing between these different sibilant sounds has been a challenge for electronic speech recognition systems. Distinguishing between these sounds is important not only for distinguishing "S", "SH", "F", "H", but also the more abrupt derivatives of these sounds, such as "CH", "K" and "T". Some existing speech recognition systems treat sibilants lumped together with the voiced aspects of the sound to derive a collective summary vector for further processing. Such systems may be considered to be spectrum aware. In contrast, other speech recognition systems employ a filter to extract the higher frequencies, which may haphazardly include harmonics of the voiced signal, and assess the short-term amplitude envelope of the high frequencies without much regard for the spectral content. In telephone applications, both of these types of systems suffer poor sibilant recognition hindered by the limited bandwidth of the telephone channel. But with full bandwidth applications as in direct microphone input, the latter technique that ignores high frequency formants is at a distinct disadvantage. Furthermore, systems of both types have had difficulty in classifying sibilant sounds in a speaker-independent manner. Therefore, it would be advantageous to have a method and system for sibilant sound classification in a speech recognition system that is speaker independent.