Both topic detection and sentiment detection within a body of text (which may comprise one or more text documents in virtually any form) are known in the art. Typically, certain algorithms, such as clustering, are used to analyze the body of text in order to detect topics within the body of text. Thereafter, as a separate step, sentiment analysis may be performed on those portions of the body of text (i.e., certain documents) deemed to correspond to each of the detected topics. In this manner, such techniques have been employed to determine sentiment about certain topics encompassed by the body of text.
While such an approach may be useful, in reality, topics and sentiment in a body of text are usually highly correlated. Sentiment polarities (e.g., “like”/“dislike”, “good”/“bad”, “for”/“against”, etc.) are dependent on topics or domains meaning that the same word may have different sentiment polarities in different domains. To address this, techniques that permit the joint determination of topics and sentiments have been developed including, for example, extension of methods for standard topic modeling such as the well-known probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Simply put, methods such as PLSA and LDA view an item of text as a mixture of global topic models. An example of an extension of PLSA may be found in Mei et al., “Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs”, Proceedings of the 16th International Conference on World Wide Web (WWW 2007), p. 171-189 (“the Mei Paper”), which describes usage of an extra background component and two additional sentiment subtopics to jointly model a mixture of topics and sentiment predictions for each item (document) in a body of text. In particular, the Mei Paper, the teachings of which are incorporated herein by this reference, teaches a generative model whereby sentiment models (i.e., either positive or negative) are determined independently of, but simultaneously with, one or more topic models based on training text. Thereafter, the resulting sentiment and topic models may be applied to targeted text to detect the existence of further topics, and their respective sentiments, within the targeted text. An example of another extension, in this case, of the LDA technique is described by C. Lin and Y. He, “Joint sentiment/topic model for sentiment analysis”, Proceedings of the ACM international conference on Information and knowledge management (CIKM) 2009 (“the Lin Paper”), the teachings of which are incorporated herein by this reference, in which it is proposed to add a sentiment layer.
While topic and sentiment modeling techniques are useful, the training text often required to develop acceptable performance and reliable topic and sentiment models may be relatively scarce or non-trivial to obtain.