1. Field of the Invention
The present invention generally relates to information processing in an interactive messaging environment, and more specifically to a topic tracking method and apparatus in an interactive messaging environment.
2. Description of Related Art
In recent years, a web-based micro social behavior application platform such as microblog, as a completely new interactive messaging environment, obtains surprising and rapid developments thanking to its convenience, grass roots support, simplicity and ease-of-use. According to statistics, up to 2010, users registered with twitter in the world have reached 75,000,000.
FIG. 10 presents an example of interactions on microblog. The user may act as a microblogger to freely issue messages of any topics on his microblog, and may also act as a fan to remark on messages issued by other users on other user's microblog.
Different from traditional web 2.0 applications, a length of message issued on the microblog is limited, for example, twitter defines at most 140 characters. Therefore, in the microblog, people use concise messages to describe a kernel part of their idea, where a lot of context information is ignored. Therefore, to better understand the meanings of the interested messages, fans have to go through the previous messages one by one.
However, it is not an easy task to find out all the wanted messages from hundreds of messages, because, on the one hand, topics have some kind of continuity over time, and on the other hand, people's behaviors on the web are discontinuous. Distributed topic context messages on the microblog make it difficult for fans to track history of a certain topic. There is a need for an efficient topic tracking method to solve this problem.
Most traditional topic tracking methods heavily depend on content similarity by directly comparing contents of the messages to identify topics of the messages. However, such traditional methods cannot be well applied to an interactive messaging environment, such as microblog, where the length of messages is limited. As shown in FIG. 10A, a microblogger named “wakenheart” firstly issues a microblog message A: “Australia is very beautiful and is as good as heaven”, and issues, after a period of time, another message B: “Today I hold a Koala in my arms, it is quiet, and how lovely it is”. When the contents of the two messages are directly compared with each other according to the traditional methods, since the contents of the two messages seem to have no repetitions and poor similarity, it is hard to associate the two messages with each other.