A user device may be used to play a video using a media player. For example, a server may stream a video to the user device. The device may render and display captions with corresponding video. For example, the captions may be subtitles for audio (e.g., spoken dialogue) associated with the video. In one example, a software renderer of the user device renders the text for the caption on the video being displayed. In this case, a server that is streaming the video to the user device sends a text file, timecodes, and text formatting metadata to the user device. The text file includes captions that are represented in text or a proprietary binary format. The timecodes indicate when to render the text. The text formatting metadata describes formatting used in rendering the text, such as font, location, and size. The software renderer on the user device then renders the text from the text file at corresponding timecodes using the formatting metadata to display the captions with the video.
Different types of user devices may be used by different users, and the different types of user devices may include different software renderers. The different software renderers may render the captions in different formats or in different locations. This may display the captions differently on different user devices. When trying to address problems with the display of the captions, it may be hard for a company to debug the problems because different types of devices render the captions differently. Thus, reproducing the problem may be difficult. Also, some software renderers may not be able to render some captions, such as a software renderer may not be able to render text vertically.
In another example, digital video disc (DVD) players display captions by displaying pictures of the captions with the video. The pictures of the captions are included in the same file with the video. For example, a DVD file may include video (V), audio (A), and captions (C). In this way, the video, audio, and captions are multiplexed together in the file. The DVD player then processes the file to display the video and the pictures of the captions together. For example, the information may be sent in the multiplex sequence of V, A, C, V, A, C, . . . , V, A, C. This sends the captions in-band with the video because the pictures of the captions are included in the same file with the video. Because the captures are included in the same file, when there are any changes to be made to the captions, then the DVD file needs to be changed. This is because the DVD file includes the captions multiplexed with the video and the audio. One change may be adding different languages for the captions. In this case, the DVD file may include the following information of: V, A, C, C (French), C (Spanish), C (Japanese), . . . . When a new language, such as Chinese, needs to be added for the captions, then the DVD file again needs to be changed to multiplex the Chinese captions into the file.