In the fields of communication and Internet application, it is necessary under certain application scenarios to add punctuation to some files with a lack of punctuation, e.g. adding punctuation for voice files. Such adding punctuation for voice files may find many applications in speech to text conversion technology, which a speaker may dictate his speech accurately on a transcript. Applications may be found from the court room testimony transcription to the class room, which a student may record the teacher's lecture and have them transcribed into text for study, to aid people with hearing problem to read what has been spoken, and using voice to command on-line transactions, to name a few.
With regard to adding punctuation for voice files, at present there exists a scheme for adding punctuation based on parsing out each word or character (or word parsing) within a sentence for processing and the determining a location of each word.
More specifically, the word parsing method is performed on known sentences stored in advance in a text library (such as a database) and to determine the word's relative location within the sentence (i.e. at the beginning, middle or end of the sentence). There is a determination of whether or not there is a punctuation used after the word in the sentence, if so, what is the punctuation mode (i.e., what type of punctuation is used with it). Accordingly, a linguistic model may be built based on the how each word is placed within a relative location of a sentence and its punctuation mode (if any) after the word.
Therefore, the prior art method requires taking the whole voice file to parse out every word or character and match the character location to each sentence, then add punctuations according to the linguistic model based on the individual word location to known sentences and a correct arrangement of the words within the sentence. This process may be time consuming, lack of intuition, and may not reflect an accurate translation or true meaning to the sentence after adding punctuations.
In addition, the linguistic model built using the location of individual word or character in the sentence, or the presence or absence of punctuation after the single word or character may be limited in relevance and may not be able to extract an actual relationship between the information contained in the sentence and the punctuation mode. Also, simply taking a voice file to be processed as a whole for the addition of punctuation fails to consider the internal structural features of the voice file to be processed. Consequently, the accuracy in adding punctuation to voice files is still quite low at present.