There is an enormous amount of content available on the Internet, as well as on other sources such as private enterprise networks. Given the amount of content, information retrieval technology is extremely valuable in locating a relevant document or a relatively small number of documents from which a user may select.
One of the ways that information retrieval technology locates relevant documents is by extracting the key phrases from documents, where in general, key phrases represent the main topic and principal information of the document. Once extracted, key phrases may be used to match documents to online search queries, for example.
As can be readily appreciated, end users and machines benefit from correctly extracted key phrases. For example, businesses, educational institutions, the scientific community and so forth require that key phrases are extracted correctly, to a high degree of confidence, with acceptable performance.
Nevertheless, known contemporary key phrase extraction technology is far from perfect. For example, one problem with contemporary key phrase extraction technology is that known solutions return a considerable number of incorrect “noise” phrases among the key phrases, even when only a small number of extracted key phrases per document are considered. Any improvement in extracting more relevant key phrases from documents is thus valuable in information retrieval.