Micro-blogging services, such as Twitter (http://twitter.com/), allow users to post short messages, e.g., “tweets”, so that they can share information on a broad range of subjects including personal updates, fast-breaking news, politics, entertainment, or just about anything else that people might discuss in everyday conversation. At least 50M tweets are posted each day. Due to its real-time property, it is of great interest to track trending (“hot”) topics in Twitter. However, given the massive number of tweets per day and their unique characteristics, there is a need to provide sophisticated information filtering algorithms to better understand the topics underlying these huge amount of tweets.
The major challenge for twitter understanding is that the number of characters in each tweet is highly limited, and abbreviated syntax is often introduced by users for convenience. Another challenge is the lack of reliable training labels. Fortunately, there are a large number of related information sources, such as Wikipedia (http://www.wikipedia.org/) and social tagging, which contain long documents, and may include tags/labels that carry additional insights about each document. A natural question is whether the model learned in this enriched source domain can be used to help to understand the abbreviated documents (tweets) in the target domain.
The problem is generalized as learning from partial observations.
For example, in text classification, it is often needed to deal with the problem of partial observations, where a significant number of word features are missing for each document. This can be a result of enforced limits on document length, as in twitter, or due to privacy concerns or confidentiality issues, which might limit the availability of email content. Learning from partial observations remains an extremely challenging task.
Traditional transfer learning approaches often involve learning a classification model in the source domain, using bag-of-words features, and then ‘transfer’ this model to the target domain. Such an approach is more likely to be effective when documents in the target domain are of comparable length and information content.
Existing domain adaptation methods are basically based on motivations that deal with the distribution difference between a source domain and the target domain, where this difference is actually induced by change in location (e.g., as in a Wi-Fi application), change in subject (as in a sentiment classification), etc. However, none of these methods can be generalized to deal with the distribution difference caused by missing word features, since they all assume that there is essentially no missing features in the target domain documents. One example implementation of a self-taught learning technique as taught in R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng entitled “Selftaught learning: Transfer Learning from Unlabeled Data” in Proceedings of the 24th international conference on Machine learning, page 766, ACM, 2007, incorporated by reference herein, is deficient in that when building a bases from the source domain documents, self-taught learning cannot use the label information to generate a set of more meaningful bases; further self-taught learning cannot be used to solve the partial observation problem either, although they also transfer the knowledge between two domains through a set of bases. [Pan et al. AAAI 2008] S. J. Pan, D. Shen, Q.
It would be highly desirable to provide the ability to reconstruct these partially observed documents by mapping them to a set of bases learned from some relevant labeled documents in other sources.