1. Field of the Invention
The present invention relates to a speech recognition technology, particularly to a Chinese speech recognition system and method.
2. Description of the Related Art
The prosody-aided speech recognition technology has been an important subject in recent years. Prosody is the suprasegmental features of continuous voices, including accents, tones, pauses, intonations, rhythms, etc. Prosody is physically expressed by the track of pitches, intensities of energy, durations of voices, and pauses of speech. Prosody closely correlates with various levels of linguistic parameters, including phone, syllable, word, phrase, sentence, and even linguistic parameters of higher levels. Therefore, prosody is useful for promoting speech recognition accuracy.
Refer to FIG. 1 a block diagram schematically showing a prosodic model generator concluded from the prior arts of prosody-aided speech recognition technologies. The prosodic model generator includes a prosodic model trainer 10, a parameter extractor 12 and an artificially-labeled prosodic corpus 14. The artificially-labeled prosodic corpus 14 receives speech data, and specialists label the prosodies thereof. From the artificially-labeled prosodic corpus 14, the parameter extractor 12 extracts spectral parameters, linguistic parameters of various levels, and prosodic-acoustic parameters. According to the parameters output by the parameter extractor 12, and the prosodic clues and events found in the artificially-labeled prosodic corpus 14 (such as the pitch accents and the boundaries of intonational phrases), the prosodic model trainer 10 establishes a prosody-dependent acoustic model, a prosody-dependent linguistic model, and a prosodic model to describe the relationships between the prosodic clues of different-level linguistic parameters and the prosodic acoustic parameters thereof.
The abovementioned prior arts can only utilize few obvious prosodic clues because they lack a large-scale corpus having abundant reliable and diversified prosodic tags. Therefore, the conventional technologies can only improve the efficiency of speech recognition to a very limited extent.
Accordingly, the present invention proposes a Chinese speech recognition system and method to overcome the abovementioned problems.