There are several means for communicating and displaying audiovisual programs of educational, commercial or entertainment value. These means include, but are not limited to, TV broadcasting and DVD or VCR playback. Any such program is made up of both audio and visual signals, or is made up of components that are coded together using analog or digital standard formats. The well known NTSC format, for example, is the standard for all analog audio and video TV broadcasting in the United States of America. In contrast, the DVD format uses a compressed digital representation of the video and audio signals. Frequently the audiovisual information is also coded together with textual information. A prominent example of this practice is that of the closed captioning (CC) of TV broadcasting in the United States. In order to allow the hearing impaired to participate in the enjoyment of TV programs, the Federal Communications Commission requires that all TV broadcasting in the United States include a coded signal of text that is representative of audio events and the speech content of the audio component of the TV program. In CC this textual information is embedded in the NTSC signal by representing text by binary codes corresponding to the letters of the alphabet, and using these codes to modulate portions of the NTSC signal. An EIA standard (EIA-608, Revision A, Recommended practice for Line 21 data service) describes the technical details of this process. A TV receiver capable of decoding and displaying closed captioned programs must extract these codes and then display the corresponding text as a subtitles on the TV screen. The process of displaying the text typically uses what is referred to as a font generator module; i.e., an electronic module that converts the binary codes into signals that drive the display of the text on the TV screen. All TV receivers presently sold in the United States are required to contain this type of font generator module.
Another, more conventional method to convey audio information in textual form is through graphical subtitles. As opposed to CC, where the alphanumeric characters that form displayable text are directly coded and transmitted, in graphical subtitling these characters are first converted into a graphical representation of same and then transmitted or coded in the graphical form. The advantage of the latter method over the former one is that no font generation is required to display the text, as the graphical representation as received can be directly overlaid on the TV screen. In addition to the simplicity in display electronics implementation, transmitting graphical representations of subtitles also allows the producer of the audiovisual content to have total control over the appearance of the subtitle text. That is, the producer can choose fonts, colors and other characteristics for the subtitle text in view of artistic and other criteria, such as having a least intrusive effect on the audiovisual content being displayed, and the producer can also dynamically change the appearance and aesthetics of the subtitle text as desired.
Graphical subtitles are commonly used to transmit textual information representative of the audio component of the TV program. One prominent example is found in the Digital Video Broadcast (DVB) standards for terrestrial, satellite and cable transmission of digital TV used in most of the world, including in satellite digital broadcasting in the United States. Another prominent example of so-called graphical subtitling can be found in the standards for pre-recorded content.
In the case of pre-recorded DVD programs, graphical subtitling is often used to support subtitles for more than one target language. However, the number of supported target languages is generally limited to but a few, which has been found to be insufficient for those countries with highly diverse ethnic populations.
The presence of closed captioning or graphical subtitling, however, provides an opportunity to implement language translation services and subtitles for an arbitrary number of target languages. To implement this service, the textual information in the source language must be extracted and translated to the target language of the viewer using text-to-text translation technology. Examples of this technology are found in several translation services that are available on the Internet.
As can be appreciated, extracting the textual information from the stream of binary codes used in closed captioning is relatively simple, however performing this same task with graphical subtitles is not simple nor intuitive.
With regard to solutions to the problems described above, Japanese patent 2000092460 (Mar. 31, 2000) describes a method for the automatic translation of CC text information from a source to a target language. This patent does not describe a method for the more common case of graphical subtitles and it does not describe the use of extensions for recognizing the source language automatically.