The present invention relates to digital television and, more particularly, to a method and an apparatus for synchronizing events originating at a digital television receiver with instants of the audio, video, or data elements of a digital television program.
Digital television (DTV) affords the opportunity to augment the basic audio and video content that characterizes an analog television program. For example, a DTV program may include multiple audio and video elements. A DTV transmission may also include a number of data elements to provide ancillary services in conjunction with the visual and audio elements of a program. In addition, applications residing at the DTV receiver can generate events related to transmitted television program elements or ancillary services. An event refers to some action or behavior initiated by the DTV receiver. Examples of events include displaying a graphic button, generating a sound, or rendering data, which may have been transmitted to or generated at the receiver.
The scope of the ancillary services and applications that might be provided with DTV is largely undefined. The ATSC DIGITAL TELEVISION STANDARD, Advanced Television Systems Committee, Doc A/53, 12 Apr. 1995, 16 Sep. 1995, incorporated by reference herein, provides for program subtitles and a program guide to be included in a DTV system. Program subtitles are analogous to the closed the caption and emergency message services provided with NTSC (analog) television broadcasts. The program guide contains information about current and future programs.
The ATSC DATA BROADCAST SPECIFICATION, Advanced Television Systems Committee, ATSC T3-504, Jul. 30, 1999, incorporated by reference herein, is a draft ATSC standard that defines data transmission that is compatible with the digital multiplex, MPEG-2 bit streams of the DTV system. Further, it specifies mechanisms necessary to allow applications resident at the receiver to be associated with transmitted data elementary streams. Each data elementary stream is delivered in MPEG-2 transport stream packets referenced by a unique MPEG-2 transport stream packet identifier (PID). The PID is used to identify the elementary data streams included within the DTV transport bit stream. The specification does not attempt to define the nature of services and applications that might be provided with the DTV system, but does anticipate that many of these services and applications will be synchronized to some extent with the video and audio elements of a program. The specification provides a suite of packetization formats, error protection schemes, and communication protocols for the delivery of asynchronous, synchronous, and synchronized, streaming and non-streaming data services to facilitate these ancillary services and applications.
In the DTV system, samples of a system time generated by a system clock are transmitted to a receiver. The system time is recovered at the receiver and synchronized to the timeline generated at the transmitter. The common time base for the encoder at the transmitter or emission station and the decoder at the receiver facilitates synchronization of the presentation of the video, audio, and ancillary data service elements of a television presentation. The encoder periodically samples the value of the system time generated by the 27 MHz. system clock and includes the sample values (42 bits per sample) as program clock references (PCR) in the transport stream data packets of one of the elementary streams (typically, the video elementary stream). At the receiver, the decoder uses the PCRs recovered from the data stream to construct the receiver's version of the system clock and synchronize it with the encoder's system clock.
The encoder also associates a presentation time stamp (PTS) and, optionally, a decoding time stamp (DTS) with each of the data access units within the elementary data streams. The PTS indicates the system time moment relative to the system timeline provided by the PCRs at which presentation is to occur. Typically, presentation refers to display of video and audio data, but presentation may refer to some other action related to the data in the data access unit. The DTS indicates the system time at which decoding of the data in the data access unit is to be undertaken. A DTS may not be provided with every data access unit, but in some cases may be inferred from the PTS assigned to another data access unit. In other cases, a DTS may be inferred from the PTS of the data access unit and a delay value either transmitted or predetermined indicating an elapsed time between the initiation of decoding and presentation. Therefore, the exact system time at which the data are to be decoded and presented so that the data are properly synchronized with a program video frame, video field or other instant of a program element is specified in the data elementary stream packets.
The system clock references and the time stamps used to synchronize the decoding and presentation of program elements in the transmitted data stream cannot easily be used to synchronize events or program elements outside the context of the data stream. The system clock is basically a counter and does not produce a value expressing time in any real sense. Further, the system clock time samples are inserted at the multiplexer that encodes the transport stream and the value of the system time at an instant of an audio, visual, or data program element is not known until the data for that instant is encoded and inserted in the transport data stream. As result, an author has no way of determining the appropriate system time corresponding to an instant of an audio, video, or data element of a program at the time of program authoring. Furthermore, many applications that might be resident at the receiver may have been authored to operate with a coarser time resolution than provided by the 27 MHz. system clock.
In program production, editors use timecode recorded on the videotape or other storage medium as an application clock to synchronize video, audio, and data instants from several sources. A common timecode format used in program production is the SMPTE (Society of Motion Picture and Television Engineers) 12M timecode. The SMPTE timecode format effectively applies an electronic address, one frame in duration, to each frame of the videotape. The video editor identifies an instant of the video content by a value of the SMPTE timecode corresponding to that instant. In a post production environment, video and audio from several sources can be synchronized at a particular instant by referencing a particular SMPTE timecode value. While the SMPTE 12M timecode provides resolution to one frame, other timecodes may provide resolution to a video field or even finer.
Similarly, MPEG-4 (CODING OF MOVING PICTURES AND AUDIO, ISO/IEC JTC1/SC29/WG11 N2995, International Standards Organization) content production utilizes an authoring system working with a timeline to synchronize various MPEG-4 elementary streams. Like SMPTE timecodes, the MPEG-4 timeline (also called the MPEG-4 scene time) will be sampled and captured as application time clock references in the authoring system.
If the application timecodes were available at the receiver and synchronized to the presentation of the elements (i.e. access units) of the transmitted program elements, an author of an ancillary service could associate an event to an instant of a program element with the timecode. Then an application at the receiver could instigate the event in synchronization with presentation of the instant of the transmitted program element. However, the DTV system does not provide for transmission of a timecode in synchronization with the transmitted program or for construction of a synchronized application time clock at the DTV receiver.
The ATSC DATA BROADCAST SPECIFICATION provides for downloading data in a DTV system in accordance with the protocols of ISO/IEC 13818-6, INFORMATION TECHNOLOGY—GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO: DIGITAL STORAGE MEDIUM—COMMAND AND CONTROL, ISO/IEC JTC1/SC29/WG11MEPEG96/N1300pl, July, 1996. The digital storage media-command and control (DSM-CC) standard provides a protocol toolkit for data transmission. The DSM-CC standard provides a temporal addressing method called Normal Play Time (NPT) for associating the MPEG-2 program clock references (PCRs) of the DTV data stream with “VCR” like application time stamps referred to as Normal Play Time (NPT) time stamps. In the DSM-CC framework, PCRs (samples of the system time) are inserted into the payload of data packets containing NPT reference descriptors (samples of the application time) so the system and application times can be associated. The Normal Play Time can be reconstructed at the receiver from the NPT reference it descriptors and locked to the system clock derived from the PCRs. The NPT time stamp is transmitted asynchronously to the receiver at some time before the event to identify the system time at which the event is to occur. An event is triggered when the system time matches the NPT time stamp for the event. Unfortunately, this method of transmitting a synchronizing application time moment requires that PCRs be generated and inserted into DSM-CC sections before or when the section enters the program multiplexer or encoder. Generating and inserting a PCR at an arbitrary location in the bitstream is a difficult task for an MPEG-2 multiplexer. In addition, the DSM-CC NPT reference descriptors and stream event descriptors containing the NPT time stamp, must be conveyed in DSM-CC section structures which requires an additional packet identifier (PID) to identify the transport packets carrying these descriptors in the DTV transport bit stream. A typical receiver can filter a limited number of PIDs and adding PIDs may complicate the receiver and increase its cost.
What is desired, therefore, is a method and apparatus for synchronizing receiver instigated events to instants of the transmitted video, audio, or data elements of a television program.