Artificial conversational entities, also known as chat bots, commonly conduct conversations with human users via auditory or textual inputs across a wide range of social platforms. In some examples, chat bots can use natural language processing systems to process an auditory or textual input and generate a textual reply based on word patterning. With a recent prevalence in images becoming a popular medium for communication, chat bots are now often relied upon to engage users in conversation about particular images displayed on social platforms.
A common approach is for a chat bot to generate comments on a user image based on textual captions or comments previously associated with the image. When a user introduces a new image, textual captions or comments upon which to base a comment is extremely sparse, and in some cases non-existent. As a result, systems are often unable to generate comments to images introduced by users for the first time.