There is a wealth of information embedded inside the text of user posts within threaded online discussions, such as forums and bulletin boards. A challenge, however, is that the information is scattered across pages, users, and even sites. Furthermore, the information is unstructured and often extremely difficult to follow. Moreover, the information within such user posts often suffers from the problem of unreliable quality. A statement made within a post may be incorrect or off-topic. Because of these issues, traditional automated analysis of discussion threads work with the meta-data and structural information, which can be used for discovering topic heat maps, finding contentious discussions based on thread length, or determining power users within a particular forum.
A great number of discussion forums focus on providing expertise and help to a community of interest. Discussion threads on these forums originate with the posting of a question or problem to be solved, and replies to the original post take the form of answers to the question. The syntactic structure and information content of individual sentences that occur within a dialogue is different from that found in monologue sentences that appear within a narrative. This makes it difficult to analyze threaded discussion posts using parsing and other natural language processing (NLP) techniques developed for written monologue.