Generally, VTRs are designed to receive and store data signals representing video (and audio information) by recording the data on a magnetic tape in a series of tracks. In addition, most VTRs are designed to support both normal and trick playback operation, i.e., fast forward and reverse operation.
The use of digital video signals, e.g., digital high definition television ("HDTV") signals, which are normally transmitted in a compressed format, present problems with regard to the implementation of trick playback operation in VTRs.
Various systems have been proposed that would locate data selected to be used during trick play operation in specific locations within the tracks on a tape so that at least a minimum amount of data required to produce recognizable images during trick playback operation can be read in a reliable manner from the tape. Tape locations which are dedicated to storing data intended to be read from the tape at a particular speed and direction of trick play operation may generally be referred to as fast scan tracks for the particular speed and direction of trick play VTR operation. For example, the phrase "3.times. fast scan track" may be used to refer to a series of tape location containing data for 3.times. fast forward trick play operation while the phrase "-2.times. fast scan track" may be used to refer to a series of tape locations containing data for 2.times. reverse speed trick play operation.
Because of limitations on the amount of data that can be read back from the tape during trick play operation using such systems, video images used during trick play operation must usually be represented using considerably less data than is used to represent images, e.g., frames, that are displayed during VTR normal playback operation.
Accordingly, because of the data constraints imposed during trick playback operation, it is important that the data used to represent video frames during trick playback operation be carefully selected.
Thus, the proposed digital VTR systems offer a number of possible solutions to the problem of how to record digital data on a tape so that it can be read from the tape in a reliable manner during trick play. However, there is still a need for an improved method and apparatus for selecting data from a compressed video data stream to represent a video frame that can be recorded on the video tape and read back and displayed during trick playback operation.
Because the method of selecting data from a video data stream for use during trick playback operation will depend in large part on the content of the compressed video data stream from which the data must be selected, it is important to have an understanding of the various elements of a compressed digital video data stream, how those elements, e.g., video frames, slices, macroblocks, motion vectors, DCT coefficients, etc., relate to each other, and how the compressed video data stream is originally created.
The International Standards Organization has set a standard for video data compression that is suitable for generating a compressed digital data stream such as a digital HDTV data stream. This standard is referred to as the ISO MPEG-2 (International Standards Organization--Moving Picture Experts Group) ("MPEG-2") standard.
While various versions of this data compression standard exist, and new versions are expected in the near future, all versions of the MPEG-2 standard are expected to use the same basic data compression techniques. For the purposes of this application, unless indicated otherwise, terms will be used in a manner that is consistent with the MPEG-2 standard that is described in the International Standards Organization--Moving Picture Experts Group, Draft of Recommendation H.262, ISO/IEC 13818-2 titled "Information Technology--Generic Coding Of Moving Pictures and Associated Audio" (hereinafter "the November 1993 ISO-MPEG Committee draft") hereby expressly incorporated by reference. Any references to MPEG-2 data streams in this application are to be understood to refer to data streams that comply with MPEG-2 standards.
In accordance with the MPEG standard, analog video signals are digitized and compressed in accordance with an MPEG data compression algorithm to produce the digital video data stream.
In accordance with the MPEG data compression algorithm, after the analog video signals are digitized, the digital data is organized into macroblocks and the macroblocks are then encoded.
In accordance with the MPEG standard, within a given frame, each macroblock may be coded using one of several different encoding techniques, e.g., motion compensation techniques and intra-frame coding techniques. Intra-frame coding refers to a macroblock coding technique in which only spatial information is used. Intra-coded macroblocks are produced using this coding technique.
Inter-frame coding, unlike intra-frame coding, uses motion compensation techniques which utilize data from other frames when performing the encoding operation. Accordingly, inter-coded macroblocks which are produced using inter-frame coding techniques are dependent on preceding and/or subsequent frames and include motion vectors which are the result of the motion compensation operation. The MPEG-2 standard allows for the optional use of both intra-coded and inter-coded macroblocks in a video frame.
In accordance with the MPEG compression algorithm, after motion vectors have been calculated in video frames that are to be inter-coded, each of the intra-coded and intra-coded macroblocks which comprise the video data are transform encoded by performing a discrete cosine transform ("DCT") operation. As a result of the DCT operation, blocks of DCT coefficients are produced. These coefficients include both DC and higher frequency (AC) coefficients.
After performing the DCT operation, the resulting data is variable length encoded by performing adaptive quantization on the data with the quantization factor mquant used being indicated by header information included in the encoded video data stream that is produced as a result of the encoding operation.
The MPEG standard provides for the arrangement of macroblocks into slices with each frame being made up of one or more slices. A slice is an integer number of consecutive macroblocks from a raster of macroblocks. Video frames which include only intra-coded macroblocks are referred to as intra-coded ("I-") frames. Video frames which include predictively coded macroblocks are refered to as P-frames. While frames which include bi-directionally coded macroblocks are referred to as B-frames. P- and B-frames are, because of the type of encoding used, inter-coded frames.
In accordance with MPEG proposal, frames may be arranged into ordered groups refered to as groups-of-pictures ("GOPs"). GOPs may be of any size where the GOP size is the distance between I-frames in the encoded bitstream. The use of groups-of-pictures, which is optional in MPEG-2, is intended to assist random access into the sequence.
To summarize, an MPEG data stream generated using the encoding technique described above, includes a series of variable length encoded video frames, each frame being represented by a series of intra-coded and/or inter-coded macroblocks, where each macroblock includes DCT coefficients and possibly motion vectors. Furthermore, the data representing the video frames may be arranged as groups-of-pictures while the macroblocks representing each video frame may be arranged into slices which represent a portion of a frame.
Because MPEG-2 allows for a wide latitude in the encoding techniques used, an MPEG-2 data stream may include I-frames on a routine basis or may not include any routine I-frames.
When I-frames are used at regular intervals, e.g., every ninth frame, the picture will be refreshed on a regular basis.
In the case where intra-coded frames are not used at regular intervals it is expected that progressive refresh will be used instead of I-frames. Both modes of refreshing the picture are allowed within MPEG-2.
In addition to permitting I-frames or progressive refresh to be used, MPEG-2 also allows for various other encoding options that complicate the selection of data for use during trick play. For example, MPEG-2 permits DC coefficients to be represented with 8, 9 or 10 bits of precision, it also permits pictures to be represented in a field picture format or a frame picture format. In addition, MPEG-2 provides two different patterns to be used for converting a 2-dimensional DCT block into a 1-dimensional sequence, the default being a zig-zag scan pattern with the optional alternative being an alternate.sub.-- scan pattern. It also provides two different tables of quantization scale factor (mquant) values to be used to encode the video data, i.e., a default q.sub.-- scale.sub.-- type table and an alternate q.sub.-- scale.sub.-- type table. MPEG-2 also allows for a change of the quantization matrix from a default quantization matrix.
For a more detailed discussion of the above possible variations between MPEG-2 encoded bitstreams, see the November 1993 ISO-MPEG Committee draft referred to above.
Because intra-coded frames can be decoded without data from other frames, they are particularly well suited for use during trick play. However, because of the data constraints imposed by the recording media for data selected for trick play operation, it often difficult or impossible to record all of the I-frames for later playback during trick play.
Furthermore, because I-frames may not occur in a predictable pattern, e.g., because the GOP size is permitted to vary in MPEG, it becomes difficult to select which I-frames should be selected for use during different modes of trick play operation.
Selection of I-frames for trick play operation is further complicated by the fact that the variable length encoded I-frames, of the type expected to be included in a HDTV bitstream may vary in size making it difficult to efficiently record the I-frame in tape segments of limited sizes allocated for trick play data.
Accordingly, there is a need for a method and apparatus that can process a compressed video bitstream, such as an MPEG-2 video bitstream, to produce from the data in the bitstream, a sufficient number of intra-coded video frames to support trick play operation.
Furthermore, it is desirable that the fully intra-coded trick play video frames produced by such a method and apparatus require less data to store than comparable fully intra-coded frames intended to be displayed during VTR normal playback operation.