The present invention relates to a video recorder and, more particularly, to a method of constructing trick play mode video displays from an MPEG-2 digital video transport stream using a digital video recorder.
A conventional analog video recorder records a video signal in its transmitted analog format (such as, the NTSC television signal format). At play time, the recorded signal is transmitted over a cable to a display device which is capable of displaying signals of that format. In addition to the standard play mode (forward direction, standard speed), analog video recorders are capable of displaying video in several “trick play” modes. Trick play modes include fast forward play, slow forward play, fast reverse play, slow reverse play, and pause. Consumers are likely to expect that video recorders used in conjunction with digital video will have, at least, the same trick play mode capabilities as analog video recorders. However, the MPEG-2 data compression techniques used with digital motion video make creation of trick play modes from the MPEG transport data stream problematic. The limited data rates and capacities of the decoder and a simple communication channel between a video recorder and a display device further complicate the creation of trick play video displays.
Motion video comprises a sequence of fields or frames (collectively referred to herein as frames) containing images or pictures. The images are originally recorded as analog signals and the analog signals are converted to digital data. The quantity of data resulting from converting analog signals to digital data is so great that digital motion video would be impractical if the data could not be compressed. However, there is considerable spatial redundancy within the data for an image and temporal redundancy between the images of a video sequence. MPEG-2 provides a toolkit of techniques that can be used to eliminate redundancy and, thereby, reduce the quantity of data required to digitally describe the images of the video sequence.
Typically, the succession of frames comprising a video sequence is divided for convenience into groups of frames or groups of pictures (collectively, GOP). The MPEG-2 standard provides for three types of video frames (I-, P-, and B-frames) based on the compression process used to encode the frame's data. Each GOP is anchored by an entirely self-coded (intracoded) frame or I-frame. Intracoding data compression techniques are used to reduce data redundancy within a single image, but the data necessary to decode and reconstruct an I-frame are available after compression. Since I-frames require a relatively large quantity of data, the number of I-frames is minimized. However, I-frames are periodically required in the data stream to enable recovery of the video stream after channel switching or error outages and the MPEG-2 standard requires an I-frame at least every 132 frames. P-frames and B-frames are produced with interframe data compression as well intraframe data compression. Interframe data compression uses motion estimation to predict the picture in a frame from the picture in one or more other reference frames (either an I- or P-frames). P-frames are frames that are forward predicted from a previous reference frame. Data for a P-frame includes motion estimation vectors describing movement of blocks of pixels between the current frame and the frame upon which prediction is based and the differential data which must be added to the blocks of the earlier frame to construct the image of the later P-frame. A P-frame requires roughly half the data of an I-frame. On the other hand, a B-frame is bidirectionally predicted from earlier and later reference frames. B-frame data comprises motion estimation vectors describing where data should be taken from the earlier and later frames and typically requires about one-fourth the data of an I-frame. B-frames are used to increase the compression efficiency and perceived picture quality but cannot be used to predict future frames. A GOP begins with an I-frame and comprises the frames from the intracoded (I-frame) anchor frame to the frame preceding the next I-frame in the data stream. A 12-frame GOP is typical for a system with a 25 frames per second display rate and a 15-frame GOP is typical for a 30 frames per second system. An exemplary 15-frame transport stream GOP might comprise the frames transmitted in the order illustrated in FIG. 1.
At the decoder, the transport stream is decoded, decompressed, and the frames are reordered to reconstruct the images of the original video image sequence in their correct temporal order. Since the data from earlier frames must be available to predict and reconstruct later frames, the frame transmission order will be different from the order in which the frames will be displayed. This requires that the encoder and decoder reorder the frames, even for standard speed, forward play mode. In standard speed, forward play mode the frames of the exemplary GOP illustrated in FIG. 1 would be displayed in the order illustrated in FIG. 2. The I-frame (I0) is the third frame displayed but must be transmitted first so that P0, B0, and B1 can be decoded. Likewise, P0 is transmitted before B2 and B3 because P0 and I0 are necessary to decode the B-frames (B2 and B3). The exemplary GOP is an “open” GOP having a prediction link to a prior GOP. The initial B-frames (B0 and B1) are decoded from the data of frame I0 and the last P-frame (P3) of the previous GOP. MPEG also provides for a “closed” GOP with no prediction links to frames outside of the GOP.
As a result of the bidirectionally predicted, temporally forward nature of MPEG-2 compressed digital motion video, selecting transport stream frames or reversing the order of frames in the transport stream is of limited usefulness in producing trick play video displays. The creation of a trick play video display requires additional sequencing of the transport stream frames. For example, the frames of the exemplary transport stream GOP of FIG. 1 might be displayed as illustrated in FIG. 3 for a reverse direction, standard speed trick play display. Repetition of a frame in the illustration indicates that the frame is displayed for a number of frame periods equal to the number of times the illustration is repeated. For example, frame P3 is repeated for three frame periods. Frames are repeatedly displayed in the trick play video display because the decoder is designed with capacity limitations dictated by the normal speed, forward direction decoding of the transport stream. The order in which frames might be decoded to produce the display illustrated in FIG. 3 is illustrated in FIG. 4. The forward prediction of MPEG-2 video may require that a number of frames be decoded to decode a displayed frame, although the decoded frames are not necessarily displayed. For example, frames I0, P0, P1, and P2 must be decoded so that frame P3 can be decoded for display. Data storage limitations in the decoder and the quantity of data that must be decoded to display a frame out of the normal forward sequence may necessitate repeated decoding of frames. For example, approximately 70% more data must be decoded to display frame P3 of the trick play display of FIG. 4 than is required to display frame B0, the first frame of the standard speed, forward play GOP. Since the decoder is not designed to store this data or decode frames any faster than one frame per frame period, repeated display of a frame may be required to avoid overflowing the system and losing data.
While transport data streams are commonly divided into GOP, the MPEG-2 standard does not require the use of GOP. Further, the MPEG-2 standard does not specify the structure (frame types and numbers) of a GOP, if used. Since the sequence of frames required to create a trick play display depends upon the structure (frame types and sequence) of the input transport data stream, the trick play mode selected, and the design limitations of a decoder designed for standard speed, forward play; creation of a trick play display for an MPEG-2 compressed digital video program is difficult and can be computationally and resource intensive.
One method used to provide trick play video displays with recorders of MPEG-2 digital video is to first decode and store an entire GOP in the forward direction. The trick play system can then select a number of frames and a display order appropriate to create the trick play video display from the decompressed and decoded frames. However, the decoder must have large and costly frame buffers to store the decompressed versions of all the frames in the GOP. Since this is not required for normal forward play, the cost of the decoder or recorder would be substantially increased. In addition, the transmission channel between the recorder and the display could easily be overwhelmed by the quantity of data required to present a trick play display from decompressed data, especially in a fast play mode. Further, this technique requires that the entire GOP be decoded, even during fast play modes. To do this, the decoder must be capable of decoding multiple frames in a single normal frame decoding period. Most decoders do not have this capability.
A second method of providing trick play video displays is to decode and display only the I-frames of each GOP. An I-frame includes all of the data necessary to decode the frame and, therefore, the I-frames of a video sequence can be decoded and displayed in any order. Since I-frames are typically only one frame in 12 to 15 frames, each I-frame would be displayed for as many frame periods as are required to create the desired frame rate. However, video produced by displaying only the I-frames has a jerky quality because of the large gaps in the content produced by discarding the intervening P- and B-frames.
In a third method of creating a trick play video display sequence, frames are decoded but are not displayed until a frame that has been selected for the trick play video display is reached. The desired frame is then decoded and displayed. Since the method does not produce an MPEG-2 transport stream for transmission between the recorder and receiver, the recorder and the video decoder must reside in the same device so that bit rate control and timing are not issues.
In a fourth method of producing a trick play display, additional I-frames are generated during the recording process and stored on a separate track of the storage medium. The additional I-frames are used to assist in reverse play. However, generating additional I-frames may require an additional MPEG-2 encoder to be included in the video recorder substantially increasing its cost.
What is desired, therefore, is a method of constructing a trick play video display frame sequence that can be decoded in a standard MPEG-2 decoder from an MPEG-2 compliant transport stream. Further, it is desired that the trick play display video sequence produce a smooth display, minimize memory and processing requirements, and be capable of transmission over a bit rate limited transmission channel between the recorder and a display device.