The sentence boundary detection technology means a technology of dividing texts included in a corresponding document into sentence units. Hereinafter, related-art technologies pertaining to this field will be introduced.
Most of sentence boundary detection methods that have so far been studied or published detect a sentence boundary using punctuation marks, blanks, or n-grams that appear at the beginning and the end of the sentence. Some of the sentence boundary detection methods are dependent on languages because using results of language analysis. Therefore, related-arts have problems such that the sentence boundary detection performance may be significantly deteriorated on user documents such as web documents that have many word-spacing errors and no punctuation marks, and have linguistic dependency in that the methods can not be applied to other languages.