Semantic analysis of documents in a corpus, such as web pages available over the Internet, can be used to better understand the content of the documents and the context of the content. However, a major hurdle to development of systems that perform semantic analysis, especially for large corpora, is that training and performance evaluation requires a large amount of annotated documents. Annotating documents can be tedious, time-consuming, and error-prone.