The present invention relates generally to an information processing and, more particularly, to information processing using primary and secondary keyword groups.
Many social media services provide summary fields such as profiles to introduce the content of each account and encourage subscription. The keywords included in these summary fields are collected and used for marketing purposes. However, many accounts do not provide much information in their summary fields, and this impedes the comprehensive collection and analysis of user information using the summary field.
In order to solve this problem, a method has been disclosed in which the co-occurrence relationship between keywords in the content and keywords in the summary are learned, and the content of keywords in the summary is estimated based on keywords in the content (see, for example, David Blei and Michael Jordan, “Modeling Annotated Data,” Proc. of ACM SIGIR, 2003).
However, this method is limited to user descriptions in the summary field corresponding to content, and keywords in the summary often do not correspond to the content. For example, a user may mention interests such as reading and sports in the summary field, but only discusses reading in the content and almost never mentions sports. In this case, the summary field includes both a related keyword (reading) that actually corresponds to content and a mixed keyword (sports) that does not correspond to the content, and the presence of the mixed keyword impedes the learning of co-occurrence relationships.
A method has been disclosed to address this problem, in which a topic model associating keywords with topics is introduced, and a special mixed keyword topic is assigned to mixed keywords in order to exclude mixed keywords from the related keywords (see, for example, Tomoharu Iwata, Takeshi Yamada, Naonori Ueda, “Modeling Noisy Annotated Data with Application to Social Annotation,” IEEE Transactions on Knowledge and Data Engineering, Vol. 25, No. 7, pp. 1601-1613, 2013).
Through this method, keywords commonly used by users that are not directly related to content such as “favorite” and “read later” are excluded as mixed keywords. However, in this method, keywords that are related keywords for some readers, such as “sports”, are treated as mixed keywords for all users. As a result, a keyword determined to be a mixed keyword for one reader cannot be used to accurately glean the situation for another user in which the keyword is a related keyword, and the learning accuracy for co-occurrence relationships between keywords in content and keywords in summary fields cannot be sufficiently improved.