An issue exists regarding the generation of captions for videos. The video captioning problem has been studied for over one decade ever since the first rule-based system on describing human activities with natural language. While some good results have been obtained, current methods focus on generating a single sentence for a video clip. It still remains largely unexplored how to generate multiple sentences or a paragraph for a long video.
Accordingly, what is needed are systems and methods that provide better captioning for videos.