In the case of transcribing and saving voice uttered by a speaker at a lecture or meeting and the like, in Japanese, it is necessary to clarify a sentence boundary by inserting a period in a suitable position. Further, it is also necessary to clarify boundaries, such as a clause, by inserting punctuation marks in a suitable position of a sentence. However, the symbol itself, such as a period and comma, is not explicitly uttered by the speaker. Therefore, a technique for detecting which position in a transcribed word sequence to insert a symbol is required. The present invention relates to such symbol insertion technique.
An example of the symbol insertion technique relevant to the present invention is disclosed in the section 3.2 of non-patent document 1. In the symbol insertion technique disclosed in the non-patent document 1, the detection of a sentence boundary is performed using the length of a pause taken by a speaker, and word information which appears before and after the pause. To be specific, a string X not including a period but pause information and a string Y including a period are considered to be different languages, and by the statistical machine translation, it is formulized as a problem for obtaining the string Y to have maximum P(Y|X), as illustrated in the following formula.
                                          max            Y                    ⁢                                          ⁢                      P            ⁡                          (                              Y                |                X                            )                                      =                                            max              ⁢                                                                    Y                    ⁢                      P            ⁡                          (              Y              )                                ⁢                      P            ⁡                          (                              X                |                Y                            )                                                          (        1        )            
To be more specific, to all the position in which a pause can be converted into a period (P(X|Y)=1), language model likelihood P(Y) in the case of inserting a period is compared with language model likelihood P(Y) in the case of not inserting a period, in order to perform evaluation of inserting a period. Here, a model depending on an expression before and after the pause and the pause length is used for a conversion model P(X|Y). Further, a word 3-gram model learned from transcriptions of CSJ (Corpus of Spontaneous Japanese) which is manually added with sentence boundaries is used to calculate language model likelihood P(Y).
[Non-Patent Document 1]
    Shimooka et al., “Interaction between Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese”, Journal of natural language processing, 2005, 12(3)