The application of sub-titles is an important element in video stream production. Sub-titles enable a user to have a better understanding of the content of the video, and particularly of the speech uttered, when the latter is hardly, if at all, understandable. This is particularly useful when a program is transmitted in a language not known to the user or when the auditory perception of the language is disturbed by poor transmission, poorly articulated speech, or background noise. Sub-titles also enable the hard of hearing to achieve an understanding of the program.
The application of sub-titles may be performed either offline or in real time (so-called live sub-titles in the parlance of the art). The application of sub-titles offline is performed on a support prepared in advance. This type of sub-titling is found, for example, in DVDs, Blu-ray discs, or transmissions broadcast offline. This offline application does not present any particular time constraint. Thus, it is possible in this case to manually insert sub-titles into the video stream, an operator verifying that the sub-titles are perfectly synchronized with the video, while presenting a visual aspect perceived to be pleasant by the user. Although subjective, this notion of pleasant perception of the sub-titles can be reduced to objective elements, for example, obtaining sub-titles which are displayed at a moderate speed, or which retain a fixed position with respect to the screen. It is also possible to use audio analysis schemes which may turn out to be expensive in calculation time to best synchronize the sub-titles and the audio/video.
On the other hand, it is not possible to use these techniques in the case of live sub-titles, as in this case the video content produced is transmitted immediately. The production of live sub-titles is generally performed by an operator producing the sub-titles live and dispatching them in the stream. This operation produces an inevitable lag between the moment at which the video is transmitted and the moment at which the corresponding sub-title is produced. This lag is still more significant when the task of the operator induces a translation from one language to the other for the production of the sub-titles.
The approach generally used to process live sub-titles is to transmit each word of sub-titles as soon as it is available in order not to add any further lag. However, this approach has drawbacks, e.g., in addition to an inevitable lag, the words arrive one after another, not always forming a coherent whole. Moreover, when the sub-titles are formed on the basis of a teletext source, a line feed may entail an upwards shift of the words. This upwards shift, or “shift up” as it is called, consists, when a new line of sub-titles begins, in shifting the set of lines of sub-titles by one line upwards and in deleting the oldest line. This effect may be particularly disturbing for the viewer, since if a word that he was reading is shifted upwards, he will have to make an additional effort to follow this word while retaining the overall sense of the sub-title.
Live sub-titles are thus often perceived by users as unpleasant and of poor quality. The presentation of the words of sub-titles has been listed as one of the main causes of poor perception of live sub-titles by users by Ofcom (Office of Communications), The quality of live subtitling, pp. 29-30. According to this study, users seem to prefer sub-titles presented in blocks. However, the definition of a “block” remains broad, and certain users may prefer sub-titles presented line by line, whereas other users may prefer sub-titles presented sentence by sentence, while other users might prefer sub-titles presented on a word by word basis.
The known schemes of the prior art involve producing or updating a page of sub-titles in a video encoder. The sub-titles can be encoded in the form of images, such as for example in the DVB-SUB standard (the acronym standing for Digital Video Broadcasting Subtitles). In this case, the style of sub-titles (colour, size of the characters, etc.) is defined by the image.
Sub-titles may also be encoded in the form of textual characters, such as for example in the various standards based on W3C TTML (the acronym standing for World Wide Web Consortium Timed Text Markup Language). In this case, the sub-titles are stored in the form of characters forming words as in a text file, and optionally header information that specifies a presentation style to be applied. The presentation style may contain information such as the size of characters to be applied, the font, and the like. By way of example, the presentation styles available in the EBU-TT standard (the acronym standing for European Broadcasting Union-Timed Text), arising from W3C TTML, are described in the document EBU-UER, EBU-TT Part 1, Subtitling format definition. 
U.S. Pat. No. 8,695,048 describes an approach for transcribing sub-titles for the deaf and hard of hearing from a first format to a second format independent of the platform. The second format can be based on the DVB-SUB standards or standards based on W3C TTML, such as for example EBU-TT or SMPTE-TT (the acronym standing for Society of Motion Picture and Television Engineers-Timed Text).
U.S. Pat. No. 5,497,241 (the '241 patent) describes a system making it possible to display sub-titles by blocks in a video. The blocks in the '241 patent are predefined blocks where each block is associated with a particular language. Thus, the choice of the language determines the form of the blocks and of the presentation.
The standards and techniques for coding sub-titles available today therefore make it possible to represent sub-titles with varied styles (different character fonts essentially), separate these sub-titles into blocks, and display the sub-titles blocks according to a desired style.