Numerous sources of information are readily available to individuals. Typically, documents are rich with information and it could be tedious for individuals to go through lengthy documents to extract the information of interest to them. An individual can go through a document and identify salient points of the document that can be represented as key phrases. However, manual identification of the key phrases can be subjective and time consuming as well.
Various information retrieval techniques have been used to identify key phrases in documents. For example, some systems utilize external knowledge in form of sources such as Wikipedia, web crawls, concept databases etc. to search and extract key phrases for a document. However, using such resources can be expensive and time consuming. Certain systems use simple data sources such as word lists and concept dictionaries. However, such data sources can be limiting as they contain relatively less information and usage of such sources may result in poor quality key phrase extraction.
Some systems use deep learning techniques for key phrase extraction from documents. However, such techniques require substantially large amounts of pre-labelled datasets that are difficult to create because of multiple resource constraints and subjective nature of the tasks involved.