Currently, as image recognition techniques develop, there have been a growing amount of research on determining the “deep meaning” expressed by images (i.e., a contextual description or explanation of the image content). However, existing automatic text generation systems for generating statements that describe content depicted by a digital image only provide simple descriptions separately for persons or objects in the digital image. Therefore, a user is not presented with information describing a relationship between persons depicted in the digital image.