Information retrieval systems can be used to identify electronic documents (e.g., books, papers, web pages, in electronic form) that may be relevant to users. For example, a user can submit a query to an information retrieval system, and electronic documents determined to be relevant to the query can be provided to the user. In some systems, electronic documents can be summarized. In this manner, a summary (e.g., abstract) of an electronic document can be provided to a user, for example, such that the user can determine whether the electronic document is of interest.
Such systems can identify, and/or summarize electronic documents based on words present in the document. For example, a query to an information retrieval system can include one or more words, and the information retrieval system returns electronic documents that include the one or more words. In some instances, electronic documents can be ranked, or otherwise selected for return to the user. In some examples, the relative importance of a word within an electronic document, referred to as word salience, can be used to discern between electronic documents. For example, a word can have a first word salience with respect to a first electronic document, and a second word salience with respect to a second electronic document, the first word salience being greater than the second word salience. Consequently, the first electronic document may be determined to be more relevant, and can be ranked higher in results than the second electronic document.
Traditionally, word salience is based on a single value for a respective electronic document. For example, word salience of a word can be determined based on term frequency-inverse document frequency (TFIDF), which provides a word salience value that increases proportionally to the number of times the word appears in the document, and is offset by the frequency of the word in the document. Such techniques, however, fail to account for the relative importance of words in different sentences of the document. For example, a word may be less important in one sentence than in another sentence within the same document.