This invention relates to the processing of digital audio and video data to be played through a television and more particularly, to the processing of audio and video data during nonstandard playback modes.
Almost all televisions manufactured today are capable of interfacing with different sources of program materials, for example, a VCR, a digital versatile disk (xe2x80x9cDVDxe2x80x9d) player, cable, DSS, etc., that provide audio signals for creating sounds and associated video input signals for creating screen displays. Some of those sources provide digital audio and video input signals in accordance with the Moving Picture Expert Group MPEG-2 audio/video digital compression standard. Further, most televisions and/or their plug compatible program sources have user interactive capabilities with which a user may choose to have the programmed source provide a subpicture display of captions, subtitles, karaoke or simple animation on the screen along with the program material. Thus, contemporary televisions and/or DVD systems preferably have the capability of processing compressed digital input signals representing audio, video and subpicture and providing digital output signals representing the desired sound, video and subpicture images. Most often, those digital output signals are converted to analog signals for use by known analog television display units.
The implementation of digital signal processing for providing a video display and associated audio from an audio-video source of programmed material presents numerous design challenges that were not encountered in the prior processing of analog audio and video signals. For example, with digital signal processing, the audio signals and the video signals are separated and are processed independently. However, the playback of the audio and video must be synchronized, so that there is a coordinated and coherent reproduction of the desired audio and video provided from the source of program material.
The source, for example, a DVD, normally provides the audio and video data in respective data packets in an xe2x80x9cMPEG-2xe2x80x9d format. Each of the audio and video data packets is received from the source of video material in a continuous data stream. Each packet of video data includes a header block followed by a data block. The data block may include any number, for example one to twenty, of frames of video data that may include a full field of video data or be a coded group of pictures that includes its own header block identifying the picture type and display order. The header block for a video data packet includes control information, for example, the identity of the format of the video data, the type of compression, if used, picture size, display order, and other global parameters.
The audio data packet has a header block that again identifies the format of the audio data with instructions relating to how the audio data is to be decoded and processed to provide desired enhancements, if applicable. Following the header block, the audio data packet includes an audio data block that has any number of blocks or frames of audio data, for example, from one to approximately twenty blocks.
Subpicture data may be provided in a data packet in one of several formats. For purposes of this description, it will be assumed that the subpicture data is being provided in a Subpicture format that is defined by the known DVD standard. The Subpicture format includes a header block, a pixel data block, and a display control sequence (xe2x80x9cDCSQxe2x80x9d) command data block. Generally, the header is used to identify the general nature of the data. For example, the header may be used to identify the format of the data, how the pixel data is compressed, if a command structure is used, how the data is to be read, etc. In the Subpicture format, the pixel data represents color and contrast information and is compressed using known compression techniques, for example, run length compression.
Selected ones of the header blocks of the audio, video and subpicture data packets include a presentation time stamp (xe2x80x9cPTSxe2x80x9d) value which is a time stamp that is applicable to the associated data. The PTS value is a time reference to a system time clock or counter that was running during the creation or recording of the audio and video data. A similar system time clock or counter (xe2x80x9cSTCxe2x80x9d) is also running in real time during the playback of the audio and video data, and if the audio, video and subpicture data are played back at the times represented by their presentation time stamps, the audio, video and subpicture data will be presented to the user in the desired synchronized manner. Therefore, the PTS value represents a desired time and sequence of presentation of the audio, video and subpicture data and thus, is used to synchronize the playback of the audio, video and subpicture data.
During the decoding of the audio data, it normally must be decompressed, reconstructed and enhanced in a manner consistent with the source of program material and the capabilities of the sound reproduction system. In some applications, audio data packets may contain up to six channels of raw audio data. Depending on the number of channels the sound reproduction systems can reproduce, for example, from two to six, the sound reproduction system selectively uses the channels of raw audio data to provide a number of channels of audio which are then stored in an audio FIFO.
The decoding of the video data normally requires decompression, conversion of partial frames into full frames and the recognition of full frames. The decoding of subpicture data requires the decompression of run length compressed bit maps of subpicture data. Simultaneously with the decoding process, audio, video and subpicture data is being played back to the user, and in that playback, the frames of audio and video data are being output and the subpicture is overlaid on top of the video and the reconstructed audio, video and subpicture must be synchronized in the playback process such that the audio, video and subpicture present a coordinated and coherent presentation.
As will be appreciated from the foregoing, demultiplexing the audio, video and subpicture data packets is a complex process of deconstructing the data packets and storing the necessary decoding instructions as well as the content data itself to permit the decoding and playback of the data in a synchronized manner. One such process, is described in a copending U.S. patent application Ser. No. 08/901,090 entitled Method and Apparatus for Audio-Video Synchronizing, filed on Jul. 28, 1997, and assigned to the assignee of the present application. U.S. patent application Ser. No. 08/901,090 is in its entirety hereby incorporated by reference.
The interactive nature of current entertainment equipment presents additional problems in a synchronized playback of audio, video and subpicture data. Normally, the audio and video data are played back in a standard play back mode; however, the user has the capability of interrupting the normal play mode of the video, for example, with a pause control, a fast forward control, or controls that allow the user to skip to another section of the video disk. Thus, the user can choose to playback the audio and video at different speeds and in different sequences than the speed and sequence of the audio and video recorded on the video disk. In those situations, it is necessary to automatically coordinate the decoding and playback of the audio, video and subpicture data so that it matches the current selection of the user. For example, if the user has selected the pause mode, the playback of frames of audio, video and subpicture data is halted resulting in the video and subpicture being frozen in time, and the audio muted. If the user selects slow forward, the playback of frames of audio, video and subpicture data is slowed to a speed selected by the user which results in the video and subpicture being played in slow motion, and the audio muted. Alternatively, if the user selects fast forward, the speed of playback of frames of audio, video and subpicture data is increased to a speed selected by the user which results in the video and subpicture being played faster, and the audio muted. All of the above nonstandard play or trick play modes may be selected in the forward and reverse playback directions. Therefore, it is required that the system time clock have the capability incrementing or decrementing depending on the trick play mode selected by the user.
Further, at the beginning of a movie, it is possible for a user to choose different viewing angles for different scenes in the movie. During playback of the movie, when a different viewing angle is selected, it is possible that the new scene has a PTS that is earlier than the current value of the PTS from the scene just played. Therefore, in essence, for a seamless transition from one scene viewed at one angle to a second scene viewed at a different angle, the STC must move back in time. However, the STC is generated by the demultiplexer, and the STC time values are incremented successively in time. Therefore, the STC from the demultiplexer cannot be readily used by the video decoding process to satisfy all of the interactive requirements of current playback systems.
Consequently, in a video system having a wide range of options to the standard play mode, there is a need to provide an STC capability that is almost infinitely variable to meet the requirements of all of the possible play modes that may be selected by a user.
The present invention provides a method and apparatus for improving the processing of audio and video data in response to a user selecting trick play modes of operation. The invention permits the seamless concatenation of discontinuous audio and video streams. The present invention has an advantage of providing a smooth and seamless play back with minimal distortion of both audio and video data in which both of the audio and video data have a transition that moves backward in time.
In accordance with the principles of the present invention and in accordance with the described embodiments, the present invention provides a digital video processor receiving audio and video data representing images and sound to be played. Selected portions of the audio and video data include respective audio and video PTS values representing a desired time and sequence of presentation of the audio and video data. The processor is responsive to user selections to selectively play back the video data in a standard play mode or a trick play mode. The processor has a demultiplexer for receiving the raw audio and video data and providing demultiplexed audio and video data to a memory. A first system time clock provides first time values in response to being continuously clocked by the demultiplexer. A CPU decodes and plays back the audio and video data as a function of the audio and video PTS values. The processor further includes a second system time clock providing second time values in response to being periodically incremented by the CPU. The CPU periodically sets the second system time clock to a second time value equal to a current first time value of the first system time clock in response to the standard play mode, and the CPU periodically sets the first system time clock to a first time value equal to a current second time value of the second system time clock in response to the trick play mode.
In another embodiment, the invention provides a method of incrementing a first system time clock with the demultiplexer to provide first time values. In the standard play mode, the first time values of the first system time clock are compared with the audio and video PTS values. A playback and display of the audio and video data associated with respective audio and video PTS values is generated in response to each of the respective audio and video PTS values being approximately equal to the first time values in the first system time clock. The second system time clock is periodically incremented by the CPU to a second time value equal to a current first time value of the first system time clock in response to the standard play mode.
In another aspect of the invention, in response to the trick play mode, the method detects the method detects a next audio PTS value being less than a current audio PTS value and sets the first system time clock to a value equal to the next audio PTS value. In addition, the method detects a next video PTS value being less than a current video PTS value and sets the second system time clock to a value equal to the next video PTS value.
These and other objects and advantages of the present invention will become more readily apparent during the following detailed description taken in conjunction with the drawings herein.