It is difficult to manually mark up entire large-scale, non-structured text data item such as an electronic book. Using a machine learning technique, markup processing can be automated. However, it is difficult to execute automatic markup processing without any errors. Especially, tags (prosody, emotions, speakers, and the like) used in text-to-speech control are normally different for respective users, and there is no only correct answer. Hence, since judgments fluctuate depending on subjective views and preferences of users, the load on markup processing becomes heavier.