1. Field of the Invention
This invention relates to the continuous speech recognition technology, and more particularly to the apparatus and method for automatically generating punctuation marks in continuous speech recognition.
2. Related Art
A general speech recognition system can be shown as in FIG. 1. The system generally contains an acoustic model 7 and a language model 8. The acoustic model 7 includes the pronunciations of commonly used words in the recognized language. Such a word pronunciation is summarized by using a statistical method from the pronunciations when most people read this word and represents the general pronunciation characteristic of the word. The language model 8 includes the methods by which the commonly used words in the recognized language are utilized.
The operation procedure of the continuous speech recognition system shown in FIG. 1 is as follows: voice detection means 1 collects user's speech, for example, expresses the language in speech samples, and sends the speech samples to the pronunciation probability calculation means 2. For every pronunciation in the acoustic model 7, pronunciation probability calculation means 2 gives the probability estimation value of whether it is the same as the speech sample. The word probability calculation means 5, according to the language rules summarized from a large amount of language materials, gives the probability estimation value for the word in the language model 8 of whether it is the word that should occur in the current context. The word matching means 3 calculates a joint probability (representing the ability of recognizing the speech sample as this word) through combining the probability value calculated by pronunciation probability calculation means 2 with the probability value calculated by the word probability calculation means 5, and takes the word with the greatest joint probability value as the result of the speech recognition. The context generating means 4 modifies the current context by using the above described recognition result, to be used in the recognition of the next speech sample. The word output means 6 outputs the recognized word.
The above continuous recognition procedure can be performed in units of a character, a word, or a phrase. Therefore, thereafter a word will refer to a character, a word, or a phrase.
To mark the recognized result with punctuation, current continuous speech recognition system requires punctuation marks being spoken during dictation, and then recognizes them. For example, to recognize "Hello! World." completely, the speaker must say, "Hello exclamation point world period". That is, in current speech recognition system it is required that punctuation marks have to be converted into speech by the speaker (i.e. the punctuation marks have to be spoken out), and then recognized as corresponding punctuation marks by speech recognition system. So it is required that the language model includes punctuation marks, i.e. language model 8 is able to give the estimation probability value for every punctuation mark of whether it is the punctuation mark that should occur in current context.
However, it cannot be expected that people say punctuation marks when transcribing a natural speech activity (e.g. in conference, radio broadcast and TV program etc.) by using the above mentioned speech recognition system. Furthermore, it is highly unnatural to speak out punctuation marks during dictation. Even when being asked to do so, people often forget to speak out punctuation marks during speaking or reading articles. Moreover, in spontaneous speech dictation while every sentence comes directly from mind, it is very difficult for most people to correctly decide punctuation marks that should be used and to speak out every punctuation mark correctly without the loss of fluency. This may be the result of the fact that punctuation marks are seldom, if not never, used in daily spoken language.
Therefore, in continuous speech recognition there is an urgent need for an apparatus and method for automatically generating punctuation marks, which should be easily used and does not require punctuation marks being spoken out in speech, and hence should not affect user's normal speech.