Numerous web pages on the Internet today are designed to convey information pertaining to news stories covering a variety of topics ranging from current events to celebrity news. The web pages displaying these news stories often follow a similar format that may be found across a number of different topics, languages, and content producers. A media web page presenting a news story or other informational content frequently contains one or more centerpiece images that are accompanied by text that may relate to the images and the persons and other content depicted in the images.
The images included in media web pages are typically embedded in the web page and arranged in relation to the accompanying text. The web page may include image caption text that is particularly related to and describes the images. For example, if an image on a web page depicts a particular person, caption text placed immediately below the image may identify the person in the image along with other contextual information. Other text, often including at least an article title and the main content of the article, may be placed in other locations surrounding the images on the web page.
The images and text found on a media web page often include elements that are presented as hyperlinks to other content. For example, one or more words in the text of a story may be presented on the web page as a hyperlink that references other content related to the hyperlinked text. Similarly, the images found on a web page may also be presented as hyperlinks to other content related to subject matter depicted in the images. An entire image may serve as a hyperlink or alternatively an image map may specify bounded areas of the image to serve as hyperlinks. The hyperlinking of elements of media web pages to additional content provides a way to increase user interactivity with the web page and creates opportunities to increase user engagement and page views, leading to increased monetization opportunities for providers hosting the media web pages. For example, a portion of an image on a media web page depicting a particular person may be hyperlinked to another web page that displays news stories or other information about that person, increasing the amount of content with which a user may interact when viewing the media web page.
However, the process of manually determining the identity of persons whose faces are depicted in the images found on media web pages and annotating the images with hyperlinks or other metadata for each depicted person is a time-consuming task for web page developers. Such a task would be alleviated with automation. One approach for programmatically identifying faces in images uses pre-established, pre-learned databases containing sample images of persons who might be found in the images in question. However, this approach only works well whenever the set of persons to be detected in the images is known beforehand so that a relevant set of sample images may be collected and stored in the database. In addition, significant time and resources are required in order to collect and maintain a database of sample images for use in identifying a number of different individuals that may be found in the images.
What is needed is an approach for automatically determining the identity of persons detected in images found on media web pages and annotating the images accordingly. More specifically, such an approach that also overcomes the challenges described above is needed.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.