The present invention relates to a video recorder and, more particularly, to a method for constructing trick play mode video displays from an MPEG-2 digital video transport stream using a digital video recorder.
A conventional analog video recorder records the video signal in its transmitted analog format (such as, the NTSC signal format). At play time, the recorded signal is transmitted over a cable to a display device which is capable of displaying signals of the transmitted format. In addition to the standard play mode (forward direction, standard speed), analog video recorders are capable of displaying video in several xe2x80x9ctrick playxe2x80x9d modes. Trick play modes include fast forward play, slow forward play, fast reverse play, slow reverse play, and pause. Consumers are likely to expect that a video recorder used in conjunction with digital television (DTV) will have, at least, the same trick play mode capabilities as analog video recorders. However, the MPEG-2 data compression techniques used with DTV make creation of trick play modes from the DTV transport data stream problematic, particularly over a simple, bit rate limited, communication channel between a video recorder and a display device.
Motion video comprises a sequence of images or frames. The images are originally recorded as analog signals. For digital television, the analog signals for a video element of a program are input to an encoder that converts the signals to digital data, compresses the digital data, and combines the digital video data with data related to the audio and data elements of the program to output a single transport data stream. The transport data stream is transmitted to a receiver where a decoder reverses the process to produce a close approximation of the original analog signal for presentation to the viewer. The quantity of data resulting from converting analog signals to digital signals is so great that digital motion video would be impractical if the data could not be compressed. However, there is considerable data redundancy within an image and between the images of a video sequence. MPEG-2 provides a toolkit of techniques that can be used to reduce this redundancy and, thereby, reduce the quantity of data required to digitally describe the images of the video sequence.
The DTV system is based on the MPEG-2 Main profile which provides for three types of video frames (I-, P-, and B- frames). Typically, the succession of frames comprising a video sequence is divided for convenience into groups of frames or groups of pictures (GOP). Each GOP is anchored by an entirely self-coded (intracoded) frame or I-frame. Intracoding data compression techniques are used to reduce data redundancy within a single image, but all of the data necessary to decode and reconstruct an I-frame is transmitted. Since I-frames require a relatively large quantity of data, the number of I-frames is minimized. However, I-frames are periodically required in the data stream to enable recovery of the video stream after channel switching or error outages, and the MPEG-2 standard requires an I-frame at least every 132 frames. P-frames and B-frames are produced with interframe data compression as well intraframe data compression. Interframe data compression uses motion estimation to predict the content of a frame from the content of one or more other reference frames. P-frames are frames which are forward predicted from a previous reference frame (either an I- or P- frame). Data for a P-frame includes motion estimation vectors describing movement of blocks of pixels between the current frame and the frame on which prediction is based and the differential data which must be added to the blocks of the earlier frame to construct the image of the later P-frame. A P-frame requires roughly half the data of an I-frame. On the other hand, a B-frame is bidirectionally predicted from earlier and later reference frames. B-frame data comprises motion estimation vectors describing where data should be taken from the earlier and later frames and typically requires about one-fourth the data of an I-frame. B-frames are used to increase the compression efficiency and perceived picture quality but cannot be used to predict future frames.
MPEG-2 provides flexibility as to use, size, and make up of the GOP, but a 12-frame GOP is typical for a 25 frames per second system frame rate and a 15-frame GOP is typical for a 30 frames per second system. An exemplary 15 frame GOP might comprise the following frames transmitted in the following order:
. . . I0, B0, B1, P0, B2, B3, P1, B4, B5, P2, B6, B7, P3, B8, B9 . . .
At the decoder, the transport stream is decoded, decompressed and reordered to reconstruct the images of the original video image sequence. Since the data from earlier frames must be available to predict and reconstruct later frames, the order of transmission of frames will be different from the order in which the frames will be displayed. This requires the encoder and decoder to reorder the frames, even for standard play mode. In standard forward play mode the frames of this exemplary GOP would be displayed in the following order:
. . . B0, B1, I0, B2, B3, P0, B4, B5, P1, B6, B7, P2, B8, B9, P3 . . .
The I-frame (I0) is the third frame displayed but must be transmitted first so that P0, B0, and B1 can be decoded. Likewise, P0 is transmitted before B2 and B3 because P0 and I0 are necessary to decode the B-frames (B2 and B3) The exemplary GOP is an xe2x80x9copenxe2x80x9d GOP having a prediction link to a prior GOP. The initial B-frames (B0 and B1) are decoded from the data of frame I0 and the last P-frame (P3) of the previous GOP. MPEG also provides for a closed GOP with no prediction links to frames outside of the GOP. As a result of bidirectional prediction and the temporally forward nature of MPEG-2 compressed digital motion video, the trick play modes that can be created by selecting frames from the transport stream are very limited and reversing the order in which frames are transported is not useful for creating reverse play display modes.
One method used to provide trick play with recorders of MPEG-2 digital video is to first decode and store an entire GOP in the forward direction. The trick play system can then select an appropriate number of frames and a display order to create the trick play video display from the decompressed and decoded frames. However, the decoder must have large and costly frame buffers to store the decompressed versions of all the frames in the GOP. Since this is not required for normal forward play, the cost of the decoder would be substantially increased which would increase the cost of the receiver or video recorder. In addition, the transmission channel between the recorder and the display could easily be overwhelmed by the quantity of decompressed data required for a trick play display, especially in a fast play mode. Further, this technique requires that the entire GOP be decoded, even during fast play modes. To do this, the decoder must be capable of decoding multiple frames in a single normal frame decoding period. Most decoders do not have this capability.
A second method of providing trick play modes is to decode and display only the I-frames of each GOP. An I-frame includes all of the necessary data to decode the frame and, therefore, the I-frames of a video sequence can be decoded and displayed in any order. Since I-frames are typically only one frame in 12 to 15 frames, each I-frame would be displayed for as many frame periods as are required to create the desired frame rate. However, video produced by displaying only the I-frames has a jerky quality because of the large gaps in the content produced by discarding the intervening P- and B-frames.
In a third method of creating a trick play video display sequence, frames are decoded but are not displayed until a frame that has been selected for the trick play video display is reached. The desired frame is then decoded and displayed. Since the method does not produce an MPEG-2 transport stream for transmission between the recorder and receiver, the recorder and the video decoder must reside in the same device so that bit rate control and timing are not issues.
In a fourth method of producing a trick play display, additional I-frames are generated during the recording process and stored on a separate track of the storage medium. The additional I-frames are used to assist in reverse play.
However, generating additional I-frames may require an additional MPEG-2 encoder to be included in the video recorder substantially increasing its cost.
What is desired, therefore, is a method of constructing from an MPEG-2 compliant transport stream a trick play video display frame sequence that can be decoded in a standard MPEG-2 decoder. Further, it is desired that the trick play display video sequence produce a smooth display, minimize memory requirements, and be capable of transmission over a bit rate limited transmission channel between the recorder and a display device.
The present invention overcomes the aforementioned drawbacks of the prior art by providing a method of creating a trick play video display from a group of MPEG video transport frames comprising the steps of including at least one transport frame in the trick play video display; determining the transmission time for the trick play video display; and reducing the number of frames included in the trick play video display if the transmission time exceeds a maximum transmission time. The number of frames included in the trick play video display is determined by conformance of the transmission time for the trick play display to the data handling limitations of the MPEG compliant decoder, including the elementary buffer, and the communication channel between the recorder and the display device. At least one transport frame is included in the trick play display, but, within the data handling limitations of the system, additional frames may be included in the trick play as required for decoding the displayed frames or to optimize the smoothness of the trick play display.