Captions associated with images are useful in various contexts. For example, captions can be used to “explain” or annotate a scene in an image. In another example, a caption generated by a computer can be used to determine if the computer has properly analyzed, or “understands,” the image. Determining the context of the image often requires determining the contents of the image (i.e. subjects, objects, and the like), as well as various aspects of a scene in an image such as any actions occurring in the image, the relation of objects within the image to each other, and the like.