The present disclosure relates to automatically generating headlines.
To generate headlines for news articles, some current approaches include manually generating the headlines or automatically identifying and selecting a sentence from an article as the title. However, these approaches are often not scalable to cover news crawled from the web. This can sometimes be due to the large amount of manual intervention required or that the approaches are based on a model set of articles with consistent content and formatting, where articles crawled from the web often have varying content and formatting.
Some existing solutions attempt to use a main passage of the articles as the headlines for those articles. However, these solutions are often not practical because important information may be distributed across several sentences in the article, or the selected sentence may be longer than a desired or allowable headline size. To reduce the size of the sentence, some solutions have attempted to reorder the words of the sentence. However, the reordering techniques used by them have yielded headlines that are susceptible to containing incorrect grammar. Other approaches, which select one or more sentences and then reduce them to a target headline size, rely on manual supervision and/or annotations, and are thus generally not scalable and are generally only applicable to a single document and not a collection of two or more news articles.
In addition, keeping knowledge databases updated with the latest headlines has often been difficult because of the level of human effort required to keep the databases up-to-date. For instance, in some existing systems, if a notable event occurs, the knowledge databases have to be manually updated with information about the event.