A morpheme denotes a minimum unit having a meaning in linguistics, and a morphological analyzer performs a function of analyzing a text by a morpheme unit the most suitable for a context. Generally, the morphological analyzer may be divided into a method based on a rule and a dictionary and a method based on machine learning.
In relevant technology, “probabilistic segmentation and tagging of unknown words (Bogyum Kim, Jae Sung Lee, 2016)” has proposed a method of segmenting and tagging coined words in the 3-step probabilistic morphological analysis. In detail, a segmentation and tagging method for unknown Korean words has been proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. The inventors have proposed a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. In such a method, a coined word pattern is learned and is combined with a conventional morpheme tagging model to increase a tagging performance for coined words, but an adverse effect occurs in general documents, causing the reduction in performance.
In another relevant technology, U.S. Pat. No. 8,275,607 (Title of the Invention: semi-supervised part-of-speech tagging) has proposed a method that allocates a part of speech to each word based on dictionaries, calculates a Baysian probability value of words unlisted in dictionaries by using surrounding context information as attributes, and allocates the most suitable part of speech. However, the method needs a dictionary and a learning set established through a manual process, and for this reason, if the field is changed, performance is reduced.