1. Field of Invention
The present invention relates generally to the field of delivery of digital multimedia program and associated data over networks such as the Internet, and specifically in one aspect to delivering closed captioning data in a broadcast “IPTV” network.
2. Description of Related Technology
The term “closed captioning” (CC) generally refers to the display of text, data and/or musical notations on a video display, such as the display of the audio portion of a television program. The CC text or data is typically synchronized with the presentation of the audio that the text is representing. The support of CC display capability has been a feature supported by most all televisions sold within the U.S. for many years.
Closed captioning for television programs is typically generated by having a typist (or a speech recognition system) transcribe the text by listening to the audio from a recorded program, and entering that information into a system that adds the textual information into the vertical blanking interval (VBI) which is then embedded with the final recorded version of the video program. Alternatively, a pre-existing script or text file for the program can be used as the basis of the CC display. The same approaches apply to embedding the CC information on VHS or DVD media.
A slightly different scenario may apply for “live” TV programs such as local or national news broadcasts. In these cases, a typist might be entering the CC information to the VBI information transmitted with the newscast as the newscast occurs. As a result, closed captioning for live programming tends to appear on the screen several seconds after the associated audio and typically contain many typographical errors. Again, a pre-existing script for the program can be used to avoid the aforementioned latency, although deviations between the pre-existing script and the actual live performance may occur.
Recently, network operators have begun using Internet protocol (IP) networks to distribute broadcast television programming to subscribers. This is to be contrasted with more traditional radio frequency (over-the-air) broadcasts, or delivery via packetized MPEG-2 program streams. Such IP delivery of broadcast television programming also requires a method for the delivery of CC data to subscriber units such as personal computers (PC), as well as a method to display such information on the display monitor of these units.
In analog television distribution systems, CC data is transmitted in the Vertical Blanking Interval (VBI) of the television signals. The VBI lines are also used for transmitting useful data other than CC; notably including Vertical Interval Test Signals (VITS) and Extended Data Services (EDS) including teletext information.
Most digital television distribution systems in operation use MPEG-2 transport format for distribution of broadcast television programs. In such systems, CC and VBI data is transmitted in digitized bit representation along with audio/video. The two most commonly employed methods are to send CC data as a part of the video picture user data, or to send CC data with its own packet ID (PID) within an MPEG stream.
In the emerging Internet protocol television (IPTV) and similar distribution networks including for example so-called “Broadband TV” and “TV-over-DOCSIS” delivery paradigms, a wider choice of audio/video codecs is being considered. For example, MPEG-2, MPEG-4/H.264 (advanced video codec or “AVC”), Windows Media Codec by Microsoft, and RealVideo by Real Networks are a few of the possible audio/video compression formats that have been deployed. While these new formats and their associated compression technology is useful in providing streaming audio/video programs to end users, most formats do not support simultaneous caption data delivery. While some video codecs have the ability to embed CC information within the video stream (MPEG-2/MPEG-4, etc.), many video codecs do not (e.g., RealVideo).
Accordingly, what is needed is the ability to transport the CC information to the display client outside of the associated video stream. There are some existing solutions for this problem, for example Microsoft's SAMI (Synchronized Accessible Media Interchange) solution provides a technique that makes off-line processing of multimedia files and generation of corresponding CC data possible. This type of solution has limited usefulness in a live broadcast environment, however, as it requires significant manual pre-processing of the CC data in order to create an out-of-band CC data feed for the Windows Media Player client.
A variety of other approaches to closed captioning of data are evidenced in the prior art. For example, U.S. Pat. No. 6,240,555 issued May 29, 2001 to Shoff, et al entitled “Interactive entertainment system for presenting supplemental interactive content together with continuous video programs” discloses an interactive entertainment system that enables presentation of supplemental interactive content along side traditional broadcast video programs. The programs are broadcast in a conventional manner. The supplemental content is supplied as part of the same program signal over the broadcast network, or separately over another distribution network. A viewer computing unit is located at the viewer's home to present the program and supplemental content to a viewer. When the viewer tunes to a particular channel, the viewer computing unit consults an electronic programming guide (EPG) to determine if the present program carried on the channel is interactive. If it is, the viewer computing unit launches a browser. The browser uses a target specification stored in the EPG to activate a target resource containing the supplemental content for enhancing the broadcast program. The target resource contains display layout instructions prescribing how the supplemental content and the video content program are to appear in relation to one another when displayed. When the data from the target resource is downloaded, the viewer computing unit is responsive to the layout instructions obtained from the target resource to display the supplemental content concurrently with the video content program. Embedding the layout instructions in the supplemental content places control of the presentation to the content developers.
U.S. Pat. No. 6,766,163 issued Jul. 20, 2004 to Sharma entitled “Method and system of displaying teletext information on mobile devices” discloses a communication system and method for communicating teletext information to mobile stations. A wireless access protocol (WAP) server is coupled to a television station and receives a signal which includes teletext information from the station. The WAP server includes a teletext decoder which decodes the teletext information in the transmitted signal. The decoded information is stored in memory using a server controller. The controller receives information requests from a network interface coupled to the mobile stations. The controller accesses the teletext information stored in memory and transmits the information to the mobile station through the network interface.
U.S. Pat. No. 6,771,302 issued Aug. 3, 2004 to Nimri, et al entitled “Videoconference closed caption system and method” discloses a system and method for closed caption in a videoconference environment. In a method according to one embodiment of the invention, a connection is established with a videoconference device. Subsequently, a closed caption page associated with the videoconference device is selected. Text is then entered on the closed caption page. The text is displayed to at least one device associated with a videoconference in which the videoconference device is participating.
U.S. Pat. No. 6,903,779 issued Jun. 7, 2005 to Dyer entitled “Method and system for displaying related components of a media stream that has been transmitted over a computer network” discloses a system and method for displaying related components of a media stream that has been transmitted over a computer network that includes at least one storage device that communicates with a television decoder and with the video display. Information from one or more components of the media stream is extracted from the media stream and delivered to one or more storage devices. This stored component is subsequently transmitted to the video display in response to an information release signal that is embedded in the information. The invention can be used to display closed caption and other information with associated audio and video signals using an audio-visual media player.
A non-real time CC generation technique (“CaptionSync™”) is available from Automatic Sync Technologies, LLC that provides the ability to produce CC data in RealText format by analyzing a RealVideo file of compressed video program. Due to the off-line processing involved, this technique cannot be applied to real time broadcast television.
From the foregoing, it is clear that while the prior art has generally recognized the need to extract CC data from television signals, the need to (i) provide CC data to client devices over networks (e.g., IP networks), and (ii) the need to enable CC decode and display capability along with a digital audio/video decoder on a client device, it fails to address several issues pertaining to IPTV deployments. For example, when CC data is embedded with packets belonging to a particular video format, decoders that wish to receive video in another format cannot make use of this CC data stream. This requires that the IPTV operator repeat CC data for each different video format anticipated in the network.
Similarly, in a managed IP network (e.g., DOCSIS), due to a priori knowledge about performance (e.g., packet propagation delays) of each element in the system, CC delivery and synchronization mechanism can be simplified such that two independent client software programs that do not necessarily share time information can be deployed; one for audio/video decoding and the other for CC data decoding.
Moreover, the prior art fails to make effective use of the pervasive connectivity of an IP network to distribute various functions of CC data extraction, streaming, service authentication etc. across multiple servers located at different locations and communicating with each other over the IP network.
Furthermore, the prior art solutions lack adequate flexibility with regard to allowing selective implementation of one or more business policies related to selectively offering CC data service to subscribers, service classification (e.g., amount and type of VBI data delivered to a user), etc.
Accordingly, what is needed are apparatus and methods that provide a mechanism for receiving multiple channels of baseband video in real-time from the content providers (typically received via satellite or some local origination source), encoding that video in real time, extracting the CC data at the time of encoding, passing that CC data to a CC streaming server as it is extracted, and delivering that data to the end-user's PC application for display as the associated video/audio is delivered. Such apparatus and methods should be preferably deployable over a packet-switched network (such as an IP network), such that subscribers can use the service by using commonly available PC or similar software applications. In order to increase utility in the broadcast television environment, such apparatus and methods should provide for the extraction of CC data from broadcast television signals in real time, and for transfer of the extracted data to users over the IP network.
These methods and apparatus should also ideally permit optimization of network bandwidth by providing broadcast/multicast transmission capability of CC data, and eliminating the need to send CC data packets over the IP network when no client device has requested it.