(a) Field of the Invention
The present invention relates to a device for generating an aligned corpus based on unsupervised-learning alignment, and a method thereof, a device for analyzing a destructive expression morpheme using an aligned corpus, and a method for analyzing a morpheme thereof.
(b) Description of the Related Art
Recently, blogs, particularly social networking services represented by Facebook and Twitter, and mobile message services such as Kakaotalk, have been used daily for smartphones as well as computers, and their uses are increasing day by day.
However, when these messages are used, a huge amount of destructive expressions that are incorrect in terms of orthography are circulated. Here, the destructive expressions represent expressions of which orthography is wrong or which are not normalized or standardized, and a sentence including such a destructive expression is referred to as a destructive sentence. The destructive sentence represents a new language use paradigm generated by activation of the Internet and propagation of smartphones.
The destructive sentence includes a destructive expression, not a normal expression, but causes no inconvenience in carrying a meaning of the sentence.
A morpheme analysis used in natural language information processing such as a machine translation, retrieval, or data mining targets normal sentences without the destructive expression. That is, the existing morpheme analysis uses a morpheme dictionary storing morpheme knowledge or morpheme information to be used for a morpheme analysis, it is impossible to contain destroyed morphemes included in the above-noted destructive sentence into a normal morpheme dictionary because of their characteristics, and it is limited to add simply destroyed morphemes to the morpheme dictionary so it is difficult to analyze the morphemes in the destructive sentence including destructive expressions, which is a problem.