Much of today's information is created and shared in an electronic format. In one example, a user may upload and share a photo through a photo sharing website. In another example, users may share articles, music, web pages, and/or ideas through social networking and/or microblogging services. Unfortunately, computing devices may lack the human intellect that may be useful to understand human generated content, such as semantic meaning. Accordingly, topic models, inference algorithms, and/or other machine learning techniques have been developed to provide a mechanism for computing devices to “learn” how to understand human generated content. For example, topic models may be used to discover topic structure (e.g. content focusing on cars, sports, a specific natural disaster, political debate, etc.) and determine probabilities that documents (e.g., a text document, an online text-based article, a blog, etc.) may relate to particular topics. However, current inference algorithms for topic models may require multiple passes over a document corpus of documents, thus making such algorithms ill-suited for large scale document repositories, such as web content, which may also change rapidly. Additionally, inference algorithms that consider merely the words contained in an article in their analysis may be unable to extract meaningful topics from large corpora of short and/or semantically diverse documents. Since many online document corpora also comprise a considerable amount of additional metadata, such as the identity of the author(s), time stamps, etc., such information may be helpful in determining the topic structure of a corpus.