Conventionally, information retrieval (IR) processing techniques are utilized by applications, such as search engines and advertisement syndication systems to represent content in compact ways. The conventional IR processing techniques are utilized by the applications to organize, search, access, and present information located on the web. Because of the exponential rate at which the web is growing, IR processing techniques have become indispensable for management and access of information.
Some conventional applications build complex language models and utilize training and testing techniques, such as neural networks, to manage and access information included in web documents. However, generating neural networks typically requires human-labeled data, which is limited and very costly.
Other conventional applications manage and access information by calculating term frequency over inverse document frequency (TFIDF), where the frequency of a word or phrase within a document is counted and normalized for the frequency of the word or phrase within the rest of the web. The TFIDF is utilized by conventional applications to organize information and to process user requests. The conventional applications calculate the TFIDF for web documents and extract and store terms as keywords for the web documents. In turn, the extracted terms are categorized, utilized to summarize the web documents, and/or utilized to respond to search requests. The conventional applications that calculate TFIDF, extract terms from web documents that are relevant statistically, but the terms may not be semantically meaningful.