Many users consume content through videos. In an example, a user of a smartphone may watch a movie trailer provided by a movie streaming service. In another example, a user of a tablet may watch a soccer game clip provided by a news website. Videos may depict various entities, such as people (e.g., a sports figure, an actor, a politician, etc.), places (e.g., a beach resort, a restaurant district of a city, etc.), and/or things (e.g., a consumer good, a car, a monument, a business, etc.). While watching a comedy video, a user may see something of interest, such as a particular monument that the user would like to identify and learning more about. If the user continues to watch the video without taking further action, the user may forget about the monument. Also, the video may not explain about the monument (e.g., the monument may be merely background in a scene of the comedy movie), and thus the user may not know the identity or location of the monument for further research. Thus, the user may forgo learning about the monument because the user does not have enough information to further research the monument.