A VTR can receive and store images (and sounds) received as signals from various sources, for example, a television tuner, an antenna or a cable. The VTR stores the received signal information, i.e. the data, by recording the data on a magnetic tape, such as a video cassette tape. The VTR can also reproduce images (and sounds) that are stored on a tape as data by reading the data on the tape and generating a signal from the data which can then be provided to a display device such as a television monitor.
To facilitate fast forward, search, and reverse capabilities, VTRs normally provide a limited number of playback speeds in both forward and reverse direction in addition to the VTR's standard playback speed which is used during normal playback operation.
VTR systems for recording and reproducing analog video signals are well known in the art. Such systems commonly use rotary head, helical scan recording methods to record data on a tape. In such systems, record/playback heads are mounted on a rotary head cylinder. The rotary head cylinder is inclined relative to the lengthwise portion of a magnetic tape which surrounds the rotary head cylinder for approximately 180.degree..
During normal operation of such video recording devices, the tape moves in a lengthwise direction while the record/playback heads rotate along with the inclined rotary head cylinder in a circular direction. As the record/playback heads rotate with the head cylinder they contact the moving tape in a manner which permits the recording or reading of data from the tape along evenly spaced tracks located diagonally relative to the length of the tape. A servo mechanism is used to control head positioning relative to the tape's position to insure that the heads contact the tape along the diagonals which form each track of data.
FIG. 1(a) is a top view of a conventional two head video recording system. As illustrated in FIG. 1(a), first and second record/playback heads HA 2 and HB 3 are mounted opposite each other on a rotary head cylinder 4. To reduce crosstalk between adjacent tracks written by heads HA 2 and HB 3, the heads are of mutually different azimuth angles.
A tape 1 surrounds the rotary head cylinder 4 for approximately 180.degree.. The tape moves relative to the rotary head cylinder as indicated by V.sub.T. Similarly, the rotary drum, and thus the record playback heads HA 2 and HB 3, rotates as indicated by V.sub.H. As the rotary head cylinder 4 rotates, the tape moves in a lengthwise direction as illustrated in FIG. 1(a). The rotating record/playback heads HA 2, HB 3 contact the tape in a manner which permits reading or writing, i.e. scanning, of data along diagonal tracks as illustrated in FIG. 1(b).
In the two head system of FIG. 1(a), a single head, either HA 2 or HB 3, contacts the tape 1 during each 180.degree. period of head cylinder rotation. During this period of tape contact, during standard operation, each head reads or writes one normal play track of data. Each track comprises a plurality of tape segments. Each tape segment may contain one or more blocks of data. The data on the tape forms a series of parallel tracks as illustrated in FIG. 1(b). The gaps between the tracks are shown only for the purpose of clarity. Accordingly, there are normally no actual gaps between tracks recorded on a tape. The slope of the tracks depends on the speed of the tape when the tracks are recorded. References to data tracks or normal play data tracks are hereinafter to data tracks written with a slope corresponding to the slope of data tracks written during standard record mode, i.e., data tracks written when the tape is moving at a standard speed for normal play operations.
In order to aid in the differentiation between tracks, data in each individual track is written at a mutually different azimuth from the preceding track. This results in a series of data tracks containing data written at alternating azimuths which correspond to the mutually different azimuths of the first and second heads HA 2, HB 3. The slanted lines within each data track of FIG. 1b are used to indicate the azimuth at which the data in each track was written.
The heads HA 2 and HB 3 can only read data written at an azimuth corresponding to the head's own particular azimuth. Thus, HA 2 and HB 3 are limited to reading data from tracks containing data written at the same azimuth as the particular head HA 2 or HB 3 with neither head being able to read the data contained in the tracks written by the other head since the data is positioned at an azimuth corresponding to the other head's azimuth.
Data tracks are normally written on the tape along diagonals which correspond to the diagonals traced by the heads across the width of the tape during normal, i.e., standard record/playback mode. During modes of operation such as playback during reverse or fast forward, referred to as trick play modes, the tape velocity is different than the tape velocity during standard record/playback mode. In trick play modes the tape speed is a function of the selected fast forward or reverse speed.
Because the tape moves relative to the record/playback heads at a speed other than the standard tape speed during trick play mode, the heads will trace over the tape along a diagonal path different than the path traced during the standard record/playback mode. In fast forward mode, the heads will trace over the tracks created during standard record/playback mode at a shallower angle than the angle of the data tracks. In reverse mode, the heads will trace across the tracks recorded during standard mode at an angle opposing the angle of the tracks recorded during standard record/playback mode. Accordingly, during VTR operation in trick play mode, the VTR's heads may cross over several different tracks of data during each pass across the width of the tape, e.g., during each 180.degree. period of head cylinder rotation, with the angle at which the tracks are crossed being a function of tape speed. FIG. 1(c) illustrates the paths traced out by the record/playback heads HA 2, HB 3 across the magnetic tape 1 during trick play mode operation at three times (3X) the standard playback tape speed (hereinafter referred to as 3X playback operation). In FIG. 1c reference numerals 1-A through 12-B are used to indicate tracks on the magnetic tape 1. Odd numbered tracks 1-A through 11-A contain data written at an azimuth corresponding to the azimuth of head HA 2 while even numbered tracks 2-B through 12-B contain data written at an azimuth corresponding to the azimuth of head HB 3.
During 3X playback operation, heads HA 2, HB 3 trace across the tracks on the tape 1 at a shallower angle than during standard playback operation. As illustrated in FIG. 1(c), head HA 2 traces across paths 13 and 15 while head HB 3 traces across paths 14 and 16. As described above, each head can only read data written at an azimuth corresponding to the head's own azimuth. Thus, during 3X playback operation, head HA 2 can only read the portions of data which the head passes over in the odd numbered tracks, i.e. the areas of the odd numbered tracks indicated by the letters a, b, e and f. Similarly, during 3X playback operation, head HB 3 can only read the portions of data which it passes over in the even numbered tracks, i.e. the areas of the even numbered tracks indicated by the letters c, d, g, and h.
As FIG. 1(c) shows, during fast forward playback and other trick play modes of operation where the tape moves at a speed faster than the standard tape speed, it will not be possible for a two head video tape recorder to read all the data contained in each track because there will be areas of track that the heads do not pass over at all. The amount of track that is covered by the heads when the tape speed exceeds the standard tape speed is only a fraction of the total track area with the track area covered being directly proportional to the ratio of the standard tape speed to the actual tape speed. For example, in a two head VTR system, during 3X playback operation, the heads will pass over approximately 1/3 of the tape area comprising the recorded tracks which are used during standard playback operation. At 9X playback, the heads will pass over approximately 1/9 of the tape area comprising the recorded tracks.
Furthermore, as discussed above, during trick play mode in a two head VTR, the heads pass over track areas where they can not read the recorded data because it was recorded by a head having a different azimuth from the azimuth of the head passing over the track during trick play mode. As illustrated in FIG. 1c, single heads can read only approximately fifty percent of the data which they pass over during trick play mode, thus greatly reducing the amount of data that can be read during trick play modes.
To increase the amount of data that can be read during trick play modes additional record/playback heads may be used. There are two approaches for using additional record/playback heads to increase the amount of data that is read during trick play mode. The first approach is to use pairs of co-located heads. The second approach is to add additional pairs of non-collocated heads to the rotary head cylinder, each head in a pair of non-collocated heads being mounted 180.degree. from the other head in the pair. These two approaches may be used independently to increase the amount of data that can be read during trick play mode. Alternatively, they can be combined to provide for maximum data recovery.
The first approach which may be used to permit the reading of virtually all data in tracks passed over by a head during trick play mode requires that single heads be replace with co-located heads, i.e. pairs of heads arranged at mutually different azimuths, in such a manner that each track area passed over by the heads is passed over by at least one head of each possible azimuth. Because of the physical proximity of each head in a pair of co-located heads, both heads pass over the same data on the tape. Thus, by using pairs of co-located heads it is possible to read all data passed over by the co-located heads with each head in the pair reading data from each alternating track which has data written at the same azimuth as the head doing the read operation.
Since this approach requires the use of pairs of heads as opposed to single read/write heads, this doubles the number of heads required to implement a VTR using co-located heads as opposed to individual heads. For example, instead of having a two head VTR system with two heads spaced 180.degree. apart, a similar VTR with co-located heads would comprise 2 pairs of co-located heads spaced 180.degree. apart resulting in a four head VTR system.
FIG. 2(a) illustrates a four head VTR system comprising two pairs of co-located heads. As illustrated, a first and second pair of co-located heads HA-HB 20, 30 are mounted 180.degree. apart on a rotary head cylinder 25. The magnetic tape 1 wraps around the rotary head cylinder 25 for approximately 180.degree. contacting one pair of the co-located heads HA-HB 20, 30 at any given time.
FIG. 2(b) illustrates the paths traced out by the pairs of co-located heads HA-HB 20, 30 across the tape 1 during 3X playback operation. In FIG. 2(b), as in FIG. 1(c), reference numerals 1-A through 12-B are used to indicate tracks on tape 1. Odd numbered tracks 1-A through 11-A contain data written at an azimuth corresponding to the azimuth of head HA while even numbered tracks 2-B through 12-B contain data written at an azimuth corresponding to the azimuth of head HB.
During 3X playback operation, the first pair of co-located heads HA-HB 20 traces across paths 33 and 35 while the second pair of co-located heads HA-HB traces across paths 34 and 36. Because co-located heads are used instead of individual heads, the data which is passed over by either pair of co-located heads can be read by one of the heads in the pair regardless of the azimuth at which the data is written. For example, head HA of the first pair of co-located heads HA-HB 20 reads the data in track portions a, b, e, and f of FIG. 2 while head HB of the first pair of co-located heads HA-HB 20 reads the data in track portions i and k. Similarly, head HA of the second pair of co-located heads HA-HB 30 reads the data in track portions j and 1 while head HB of the second pair of co-located heads HA-HB 30 reads the data in track portions c, d, g, and h. Thus, by using pairs of co-located heads virtually all the data in paths 33, 34, 35, and 36 which are traced by the heads during trick play mode operation can be read.
The second approach to increase the amount of data that is read during trick play mode also requires the use of additional heads beyond the two heads used in a basic VTR system. In accordance with this second approach N heads, where N&gt;1, may be arranged so that the N heads are equally distributed over the range of the rotary head cylinder used to read/write a track of data, i.e. a 180.degree. portion of the rotary cylinder head. Accordingly, the total number of heads in such a system is 2N since there are N heads on each 180.degree. portion of the rotary head cylinder.
In such a system, there are N heads in contact with the tape at any given time. During standard playback operation, N-1 heads provide redundant information which can be used for error checking or other purposes. However, during trick play modes where the tape moves at a speed faster than the standard speed, each of the N heads will pass over a different portion of the tracks and read some data not read by the other heads. When the tape moves at N times the standard speed, during NX playback operation for example, each one of the N heads will pass over a different 1/N.sup.th of a track written on the magnetic tape so that at least one of the N heads will pass over each section of the track. Thus, by using additional heads in this manner, additional data may be read during trick play operation.
Referring now to FIG. 3(a) there is illustrated an 8 head VTR system having four heads distributed evenly over each 180.degree. portion of a rotary head cylinder 40. Thus, in the illustrated system N=4. As illustrated in FIG. 3a, in a system where N=4, there are four heads in contact with the tape 1 at any given time.
When the system of FIG. 3(a) is operated in 4X playback operation the tape 1 moves at 4 times the standard tape speed. In such a case, during each pass, at least one of four heads will trace over each 1/4 section of a track. Thus, as illustrated in FIG. 3(b), the heads of the 8 head VTR of FIG. 3(a) will trace over all sections of the tape's tracks as the heads trace over one track after another during 4X playback operation.
Thus, if each head in the VTR system of FIG. 3(a) could read all of the data over which it passes, all the data on the tape could be read during 4X playback operation. However, as described above in regard to two head VTR systems, data in alternating tracks in a VTR system using helical scanning methods are written on the tape by heads with different azimuths. Accordingly, each one of the N heads in a system, having N heads on each 180.degree. portion of a rotary head cylinder such as the system of FIG. 3(a), will only be able to read data in tracks written using a head having the same azimuth as the head attempting to read the data. Thus, while all portions of the tracks will be traced over by one of the N heads while operating in NX trick play mode, not all the data, i.e. only about 1/2 of the data, will be read because each head will only be able to read data from every other track written at a standard speed due to the fact that the data in alternating tracks were written by heads having different azimuths.
In order to read all the data passed over by the individual heads, pairs of co-located heads can be substituted for each of the N individual heads on each 180.degree. portion of a rotary head cylinder. The use of N pairs of co-located heads equally spaced from each other on each 180.degree. portion of a rotary head cylinder provides a VTR system capable of reading almost all of the data during NX playback operation. Such a system generally requires 4N heads to implement. Thus, for example, in order to read virtually all the data from tracks during 4X playback speed requires a sixteen head VCR.
While known VTRs are primarily directed to recording of analog signals, current advances in technology enable images to be encoded and decoded in digital form and transmitted as a digital data stream. Accordingly, VTRs must be able to store and retrieve images that can be represented in digital form.
The digital representation of images, especially moving images with accompanying sound, requires a high digital data rate. Thus, digital television signals require a high data rate. High Definition Television ("HDTV") which include systems capable of displaying higher resolution images with greater clarity than are possible with the current National Television Systems Committee (NTSC) standard, will require an even higher digital data rate to represent video images than is required to digitally represent images of a similar quality to those transmitted in accordance with the current NTSC standard.
In order to provide the high data rate needed to support HDTV recording and playback, VTRs capable of recording two data channels per track may be used. Referring now to FIG. 4(a), there is illustrated a 2 channel, 4 head VTR system. As illustrated, a 2 channel VTR uses a pair of heads to write to or read from each track of data. Each pair of heads HA.sub.1 -HB.sub.1, HA.sub.2 -HB.sub.2, in a 2 channel VTR comprises two heads HA, HB of mutually different azimuths mounted on a rotary head cylinder 4 in such a manner that the heads in each pair of heads are capable of simultaneously writing to, or reading from, the two channels of a track on the tape 1. Thus, in such a system, the data rate that the VTR can support is nearly double the data rate a single channel VTR can support. As illustrated in FIG. 4(b), the tracks, T1 through T6, written by a 2 channel VTR each comprise a first and second data channel, channel A and channel B, respectively.
Compression and decompression techniques may be used to reduce the amount of digital data needed to represent images and sound. Accordingly, such techniques are important in reducing the amount of digital data which must be transmitted for television signals and the amount of data which must be recorded by VTRs. However, even with such data compression, HDTV will still require large amounts of digital data to be transmitted at high data rates to achieve HDTV picture and sound quality. For example, one proposed HDTV system requires 24 million bits per second of digital data to be transmitted to achieve HDTV picture and sound quality.
The International Standards Organization has set a standard for compression which includes the use of motion compensation principles. The standard is referred to as the ISO-MPEG (International Standards Organization--Moving Picture Experts Group) standard. MPEG compression uses an adaptive motion-compensated Discrete Cosine Transform (DCT) that perceptually optimizes picture encoding on a block-by-block basis. The MPEG motion compression technique has both unidirectional and bidirectional prediction capabilities (both forward and backward in time) to accurately predict frames. This allows more bytes to be used for picture detail.
In accordance with the MPEG standard, analog video signals are digitized, matrixed and filtered to produce an internal format used for the compression process. The compression process performs compression using the MPEG compression algorithm.
In summary, the MPEG compression operations that are implemented in the compression process include motion compensated predictive coding and adaptive Discrete Cosine Transform (DCT) quantization. MPEG utilizes data structures known as frames. A frame contains picture information and defines one complete video picture. For example, a frame of video can consist of an array of luminance pixels (Y) and two arrays of chrominance pixels (Cr, Cb).
According to the MPEG compression algorithm, frames are classified into one of three types: intracoded-frames (I-frames), predictively coded frames (P-frames) and bidirectionally coded frames (B-frames). I-frames use purely spatial compression, and are processed independently of other frames. Thus, I-frames are processed entirely by intra-frame operations. A complete picture can be generated from an I-frame alone.
P-frames are coded using the previous I- or P-frames. The compression of P-frames relies on temporal prediction from previous I- or P-frames. Only forward motion estimation/compensation is used in the temporal prediction. While P-frames may contain some intra-coded data, a complete picture, of the same quality as a picture which can be generated from an I-frame, cannot be generated from a P-frame alone because of the use of forward motion estimation/compensation in a P-frame.
B-frames are coded by a bidirectional motion compensated predictive encoder using the two adjacent I- or P-frames. B-frames are temporally predicted from two adjacent anchor frames. Both I- frames and P-frames serve as anchor (or reference frames) to the motion compensation of other frames. The B-frame temporal prediction uses motion compensation in forward and/or backward directions. B-frames are never used to predict other frames. Because of the dependence of B-frames on the two adjacent anchor frames, B-frames alone do not contain sufficient data from which to generate a recognizable picture.
The above three types of frames differ in their use of motion estimation. Motion estimation refers to the process of computing the spatial displacement of blocks of pixels due to motion. The resultant motion vectors are used in motion-compensated predictive coding. MPEG uses both forward motion estimation (in which the estimation is of the future referenced to the past), and backward motion estimation (in which the estimation is of the past referenced to the future). Forward and backward motion estimation are also combined to produce bidirectional motion estimation.
In accordance with the MPEG proposal, frames are arranged in ordered groups. A typical group is a series of frames containing, e.g., in the order of their being displayed, one I-frame, two B-frames, a P-frame, two B-frames, a P-frame and then two B-frames. FIG. 5 illustrates a typical Group of Pictures in the order they are displayed and the temporal prediction relationship between the various frames which comprise the group.
A group of pictures is intended to assist random access into the sequence. In the stored bit stream, the first coded frame in the group is normally an I-frame.
In accordance with the MPEG proposal, after the analog video signals are digitized, the digital data is organized into macroblocks. A macroblock is the unit of motion compensation and adaptive quantization. A number of macroblocks comprise a frame. Each macroblock defines a predetermined spatial region in a picture, and contains luminance and chrominance information.
The MPEG proposal provides for the arrangement of macroblocks into slices. A slice is an integer number of consecutive macroblocks from a raster of macroblocks. A slice represents the boundary within which differential coding of macroblock parameters, e.g. DC coefficients of a DCT, and motion vectors, is performed. Each slice has its own header information and can be independent of other slices. Each slice contains at least one macroblock. Slices do not overlap and there are no gaps between slices. The position of slices may change from picture to picture. The first slice starts with the first macroblock in the picture and the last slice ends with the last macroblock in the picture. The first macroblock in a slice has its macroblock parameters, e.g. DC coefficients of a DCT (if intra-coded) and motion vectors, differentially coded from a constant value. Each subsequent macroblock in a slice has its macroblock parameters measured as an offset from the previous macroblock in the slice. Accordingly, the size of the slice is the minimum size for which a piece of data can be recovered and correctly decoded. If part of a slice is lost, it may not be possible to decode the differences in motion vectors and DC coefficients contained in the remaining part of the slice.
FIG. 6 illustrates a macroblock in accordance with the MPEG proposal which may be used, e.g. for HDTV signals. As illustrated in FIG. 6, a macroblock comprises four 8.times.8 luminance blocks (Y0, Y1, Y2, Y3) and two 8.times.8 color difference blocks (Cr and Cb). The four luminance blocks (Y0, Y1, Y2, Y3) and two color difference (Cr, Cb) blocks, which form a single macroblock are used to encode a 16.times.16 picture element array covering the same spatial region in a picture. As described above, a macroblock serves as the unit of motion compensation and adaptive quantization.
In accordance with the MPEG proposal, motion-compensated predictive coding is carried out by calculating motion vectors for every macroblock in a P-frame or B-frame. MPEG compression encodes motion vectors on a macroblock basis, but does not specify the technique for computing them. Thus, a variety of different motion estimation techniques can be implemented consistent with the MPEG standard. One technique, for example, is to compute motion vectors from the frame-to-frame correlation of blocks of pixels in the luminance signal, resulting in a motion vector for the luminance component of the macroblock.
The best mode for encoding each macroblock is selected. Within a given picture, each macroblock is coded in one of several different modes. The intraframe coding mode refers to macroblock coding in which only spatial information is used. Conversely, the interframe coding modes (forward motion, backward motion and bi-directional motion) refer to macroblock coding in which information from frames other than the current frame is used in the coding, typically for temporal prediction in motion-compensated predictive coding. For I-frame macroblocks, only intraframe coding mode is available.
P-frame macroblocks are first checked to determine if interframe coding without motion compensation is appropriate. This decision is made by computing the luminance energy of a forward prediction residual for the macroblock that results from an interframe coding without motion compensation, and comparing it to a threshold value. If the residual energy is below the threshold, then the macroblock will be coded using interframe coding without motion compensation. Otherwise, the residual macroblock from interframe coding with forward motion compensation will be derived and used in the final step of the coding mode selection.
B-frame macroblocks are similarly processed to determine whether interframe coding is appropriate. Since B-frames may be bidirectionally coded, interframe coding can be either forward or backward, based on the preceding and following anchor (i.e., I- or P-) frames. It may also be based on the average of those macroblocks from the preceding and the following anchor frames. In interframe coding using motion compensation, there are three possible modes: forward, backward, and bidirectional. The choice of coding mode for B-frame macroblocks is also determined on the basis of luminance prediction residual energy.
The final step in the coding mode selection for both P- and B-frame macroblocks is to choose between interframe coding and intraframe coding. Generally, P-frames and B-frames are encoded using interframe encoding. This selection is made by comparing the luminance energy of the original macroblock to the energy of the luminance interframe (with or without motion compensation) prediction residual macroblock. If the original macroblock has less energy than the prediction residual macroblock, the intraframe coding mode is selected.
After the motion vectors have been calculated, each macroblock is transform encoded. In summary, the macroblocks are transformed from pixel domain to the DCT coefficient domain. The picture information in each frame (i.e., pixel values for I-frames, and residual error after prediction for B and P-frames) is transformed using the DCT and then adaptively quantized. For the purpose of performing the DCT, a frame picture is divided, for example, into blocks of values (i.e., arrays of DCT coefficients). Each quantized DCT coefficient along with other MPEG-specific data is variable length encoded by the video encoder module to form MPEG codewords.
The DCT process generates blocks of DCT coefficients in a zigzag scanned format (i.e., the low-frequency coefficients are followed by the higher frequency coefficients). This zigzag scan arrangement facilitates the subsequent run-length coding process. The DCT coefficient for which the frequency is zero in both dimensions is called the DC coefficient.
Next, adaptive quantization is performed on each block of DCT coefficients. After adaptive quantization has been applied to the DCT coefficients, the coefficients undergo further compression involving such known techniques as differential coding, run-length coding and variable length coding. As a result, the video compression encoder module produces encoded data, in the form of variable length codewords, and information concerning the number of header and coded data bits per macroblock. The header provides,, inter alia, a mechanism for dynamic specification of the picture size, in terms of pixels per line and a pixel aspect ratio. The video compression encoder module also outputs information that states which frame the encoded data represents and which macroblock and slice the encoded data represents.
The codewords are then further encoded by, for example, a transport encoder, to provide reliable delivery of the variable length encoded compressed video.
The MPEG compression standard also produces D-pictures, also referred to as D-frames. A D-picture is coded using only intraframe encoding. Of the DCT coefficients in the coded representation of a D-picture, only the DC-coefficients are present. Thus, D-pictures comprise the DC coefficient of each DCT block in the frame. D-pictures are not used in sequences containing frame types, such as I-, P-, or B-frames.
D-pictures are thus stored separately from the normal MPEG bitstream and must appear in a separate picture sequence that cannot contain any other type of picture.
Furthermore, D-pictures must be encoded and transmitted separately. They must also be decoded using a separate algorithm from the algorithm used to decode the other frames, i.e. the I, P & B-frames. Thus, D-pictures cannot be decoded in conjunction with other MPEG data such as I-frames.
A proposed standard for HDTV using motion compensation compression techniques is the Advanced Digital Television ("AD HDTV") system developed by the Advanced Television Research Consortium. The proposed AD HDTV system is described in the Advanced Television Research Consortium's "Advanced Digital Television, System Description" of Jan. 20, 1992 and in the Advanced Television Research Consortium's "Advanced Digital Television, Prototype Hardware Description" of Feb. 12, 1992 which are both herein expressly incorporated by reference. The proposed AD HDTV system uses a modified data compression technique based on the ISO-MPEG standard, called MPEG++.
MPEG++ compression uses a two-pass encoding system that has the function of adaptively segregating video data produced by the compression processor into a subjectively important high priority ("HP") bit stream and a less important standard priority ("SP") bit stream. The high priority bit stream provides data sufficient to produce a viewable picture and the additional standard priority bit stream provides the additional data need to produce full HDTV quality video.
Separation into high and standard-priority data streams is carried out using an adaptive prioritization algorithm which takes into account, inter alia, the MPEG frame type (i.e., I, B or P), and the relative occupancies of HP and SP rate buffers at the output of the MPEG++ encoding system. Highest priority is given to the MPEG headers that indicate the start of video data blocks (e.g., slices and macroblocks), which are needed to initiate the decoding of received video data. I-frame data and P-frame motion vectors are also given relatively high priority, while most B-frame data is transmitted with standard priority. The adaptive prioritization algorithm outputs the data stream of codewords and a signal representing the priority level for each codeword stream.
The AD HDTV system uses a Prioritized Data Transport (PDT) format to provide reliable delivery of variable length encoded compressed video data. The PDT format supports flexible multiplexing of video, audio and data services without requiring preselection of operating bit-rates. The AD HDTV system accordingly formats all data into a sequence of fixed length packets, each with appropriate headers for identification and error recovery. These packets are called transport cells.
The data stream of codewords and the priority level for each code word, i.e. HP or SP, is received and the transport cells are filled with the data as appropriate to its priority. Each transport cell is tagged with an Adaption Header which includes information necessary to restart video decoding if synchronization is lost prior to the current transport cell. This information might include macroblock number, block position within the macroblock, frame number, field or frame coding, quantization level, priority of the data, and a pointer to a data boundary within the cell. Cells at different priority levels, i.e. HP or SP, may have different header information as appropriate to decode data of the given priority level.
As described above, the proposed priority encoder of the AD HDTV system separates a single encoded video codeword stream from the compression processor into two data streams corresponding to two priority levels: the high priority (HP) and the standard priority (SP) data streams. The goal of the priority encoder is to produce a HP codeword stream that represents a viewable picture. This HP codeword stream can be transmitted at a higher power and in a separate frequency range to increase the area of reception for at least the basic video picture.
The proposed AD HDTV system allows different approaches and criteria to be employed in the construction of the HP and SP codeword streams. An allocation process takes place once at the beginning of every frame to determine the fraction of data for that frame that should be transmitted on the high priority channel. This decision is based on the type of frame being transmitted (I-, P- or B-frame), the number of bits generated for that frame (available from the compression processor) and the state of HP and SP buffers. In general, I-frame information tends to be the most important, and is generally transmitted on the high priority channel. There are two reasons for this. First, the effect of transmission error on I-frame data lasts longer than that on a P- or a B-frame because it is the basis of prediction for both P- and B-frames. Second, since there is no temporal prediction for I-frames, errors on DCT coefficients may result in complete loss of picture information for a macroblock.
P- and B-frames, on the other hand, can rely on partial motion information to produce reasonable images, even in the event of complete loss of the DCT coefficients due to transmissions errors. Hence, the general objective is to transmit as large a fraction of the I-frame data as possible on the high priority channel. For P-frames, most if not all motion vector data, and possibly some DCT coefficients are transmitted on the HP channel. More DCT coefficients are transmitted on the HP channel if additional capacity is available. It is important to at least transmit motion information for these frames on the HP channel since the effect of losses tends to propagate until the next I-frame. Finally, B-frames are considered the least important because they are not used for prediction purposes. Therefore, B-frame errors are constrained to a single frame and do not propagate to other frames. In general, the amount of B-frame data that are transmitted on the high priority channel is the smallest among the three types of frames.
While the AD HDTV priority assignment process does not specify exactly what must appear in the HP data stream, the AD HDTV proposal provides general guidelines of priority assignments that can be used for each frame type. The AD HDTV proposal states that for all frame types, the three most important types of information are frame headers, slice headers and macroblock information (addresses, types and quantitization). For I-frames, next in priority are (in order) DC DCT coefficients, low frequency DCT coefficients and finally high frequency DCT coefficients. For B- and P-frames, next in priority are (in order) motion vectors, DC DCT coefficients, low frequency DCT coefficients and finally high frequency DCT coefficients. As stated above, the codewords are prioritized into DCT coefficients of increasingly higher spatial frequency.
In the proposed AD HDTV system, the HP data rate is one fourth the SP data rate. Accordingly, the ratio of HP to SP data is 1:4.
FIG. 7 illustrates the structure of a transport cell in accordance with the AD HDTV system proposal. Each cell contains an error correction layer and a prioritized data transport (PDT) format layer. As illustrated in FIG. 7, there are three sublayers within the PDT format layer.
They are a data link layer, an MPEG++ adaption layer and the service data layer. The data link layer comprises a service type byte which carries the priority level indicator (HP or SP) and a frame check sequence for error detection. Accordingly, the service type byte allows immediate identification of a transport cell to be either high or standard priority. The service type byte also identifies the data type for video, audio, and auxiliary data and contains a 4-bit continuity counter (CC) component. This counter increments by one for each cell transmitted. The continuity counter allows the receiver buffers to detect cell discontinuity (due to uncorrectable cell errors) for a particular transport service.
The MPEG++ adaption layer allows a decoder to synchronize to variable length codes within the MPEG compressed video service. The first usable entry-point in each cell is identified and stored in an Adaptation Header (AH) of the MPEG++ adaption layer. For high priority data, the AH contains slice entry point information (i.e., a pointer to the first bit of the entry point of the slice in the transport data), frame type information, the frame number and the slice numbers within frame. For low priority data, the AH contains a pointer to the start of a macroblock, frame type information, the frame number and the macroblock number within the frame.
The video service layer of each transport cell contains transport data which may include video, audio and/or control data. The transport data includes video-specific parameters that can be used for resynchronization after a long burst of errors. A record header (RH) appears at the beginning of each slice, and is sent in the high priority transport cells only. Any number of record headers may appear in a cell, but only the first is used as an entry-point in the AH. The entry-point feature in the AH for a HP cell, as stated above, contains information regarding the location of the start of data block (which is always a RH), as well as other information such as the frame type and slice number. The RH can include a priority breakpoint (specifying the break between HP and SP information), a vertical position, a quantization scale factor, and a record header extension.
To summarize, in accordance with the AD HDTV system proposal, each HP cell contains data arranged in slices. Each SP cell contains data arranged in macroblocks. Entry points allow these data blocks to be segmented across cell boundaries. However, the AH information only contains one pointer to the start of the macroblock or slice. There may, however, be more than one macroblock or slice starting in each cell. Thus, at least one of these blocks will not have an entry point recorded in the AH. Alternatively, a macroblock or slice may take up many cells, and thus there is not an entry point for the block in subsequent cells. In the event of a cell loss, the entry point information can be used for the rapid resynchronization of the transport data. In the event of an error leading to the loss of a cell without an entry point, the receiver will restart decoding at the next block with an entry point.
Another proposed standard for HDTV is the DigiCipher.TM. system (also referred to as the ATVA-Interlaced system) developed by General Instrument Corporation. This proposed system is described in General Instrument Corporation's "DigiCipher HDTV System Description" of Aug. 22, 1991 which is hereby expressly incorporated by reference. The DigiCipher system uses transform encoding as the technique of data compression.
The DigiCipher system does not have complete, temporally coincident frames of intra-coded data. Rather, intraframe data updates an image on a regular basis in vertical columns on the screen.
In the DigiCipher system, a pixel is an 8 bit active video sample (luminance or chrominance) while a block is an image area of 8.times.8 pixels. A superblock is an image area comprising 4 luminance blocks horizontally by 2 luminance blocks vertically and one associated chrominance block each for U and V values derived from that image area. A macroblock is an image area of eleven horizontally arranged superblocks.
The DigiCipher system transforms a block of pixels into a new block of transform coefficients using the DCT. The transform is applied to each block until the entire image has been transformed.
Next the number of bits required to represent the DCT coefficients is reduced. Accordingly, a coefficient quantization process gives weights to each of the DCT coefficients. Each coefficient is divided by its weighing factor. Then a quantization factor is determined based on scene complexity and perceptual characteristics, and additional scaling takes place by dividing the weighted coefficients by the quantization factor.
The quantization method of the DigiCipher method, however, is not applied to the DC coefficient. The most significant bits of the DC coefficient are always selected, independent of the quantization level.
Next a statistical coding technique, such as a Huffman coding, is used that does not degrade the image. The DCT coefficients are serialized into a sequence and amplitude/run length coded. A codeword is assigned indicating the amplitude of the coefficient and the number of zeros preceding it (runlength).
In addition, the DC coefficient is Huffman coded after it is differentially coded within a superblock. The efficiency of this coding process is heavily dependent on the order in which the coefficients are scanned. By scanning from high amplitude to low amplitude, it is possible to reduce the number of runs of zero coefficients typically to a single long run at the end of the block. The coefficients are zigzag scanned going down first from the DC coefficient.
There is a limit to the amount of compression possible by spatial processing alone. An interframe coder, however, can benefit from temporal correlation as well as spatial correlation. A very high degree of temporal correlation exists whenever there is little movement from one frame to the next.
In the DigiCipher system, the signal is compressed by first predicting how the next frame will appear and then sending the difference between the prediction and the actual image. A reasonable predictor is the previous frame. This sort of temporal differential encoding will perform very well if little movement occurs or if there is little spatial detail. At other times, it will be less effective and occasionally worse than if the next frame had simply been encoded without prediction.
Instead of transform coding an image directly, an estimate of the image is first generated using motion compensation. The difference between this estimate and the actual image is then transform coded and the transform coefficients are then normalized and statistically coded as before. The second of the two frames from which the motion estimates are derived is always the previous frame as it appears after reconstruction by the decoder.
Differential processing in general causes a basic problem for the decoder. When a decoder is tuned to a new channel, it has no "previous frame" information. Acquisition would be delayed until at least one pulse code modulation ("PCM") version of every block is received, which results in an unbounded acquisition time.
Thus, in the DigiCipher system, during each 0.37 second interval, all blocks are processed once in PCM form on a distributed basis. This technique results in a 0.37 second differential pulse code modulation ("DPCM") based acquisition time component, but spreads the resulting increase in channel bits uniformly over time.
The 0.37 second parameter would imply a forced PCM block once every 11 frames, and there is a necessary but non-trivial reduction in the overall compression efficiency. The 0.37 second parameter can be varied to trade off acquisition time versus efficiency.
Thus, the DigiCipher system has very little tolerance for errors or missing information in the data stream. The DigiCipher system will repeat a macroblock from the previous frame when an error is detected. Errors are detected by checking whether all the compressed data is used when a macroblock processing is finished. Because of the variable length encoding of data, resynchronization must take place after an error occurs. There is no place for resynchronization, however, except at the start of the next frame using a next frame pointer.
The above-described systems do not specify the data formats and compression techniques to make the systems suitable for VTR applications. Requirements peculiar to VTRs include the need for the ability to record for normal speed playback as well as fast forward playback at a variety of speeds, reverse playback at "normal" speed and other speeds, slow motion playback and freeze-frame viewing. A VTR must be able to receive data and arrange it so that it can be stored on a tape in a suitable format to allow playback at different speeds and in different modes.
The playback of recorded compressed digital video data is difficult at speeds faster than the normal forward speed and in reverse direction. The reason is that digital compression systems, such as those systems described above (i.e., the AD HDTV system and the DigiCipher system) produce very compact non-redundant descriptions of images. Consequently, the delivery of only a portion of the compressed data (such as occurs at higher than normal playback speeds) results in a playback data stream that is largely incomprehensible to a video decoder.
The use of the MPEG standard for supporting fast play modes in a VTR has been suggested by a report titled "Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s", IS0 2-11172 rev 1, Jun. 10, 1992, hereinafter "the MPEG report", which is hereby expressly incorporated by reference. In the MPEG report, at pp. D-52 to D-54, it is suggested that MPEG D-frames and I-frames, both of which contain only intra-coded material, can be used to support fast forward play.
As described above, MPEG D-frames, which are an extension of the normal MPEG data stream, contain only the DC coefficients of a DCT transform. Therefore, D-frames contain only information encoded using intra-frame processing. In MPEG, D frames are completely independent of the normal bitstream of I-, B- and P-frames and thus must be encoded, transmitted and stored separately from the normal data stream. Furthermore, D-frames must be decoded by a different algorithm which requires the use of a separate decoder circuit from the decoder circuit used to decode I-, B-, and P-frames.
Such requirements of separate encoding, decoding and storage of D-frames adds to the cost and complexity of a VTR which uses D-frames for fast play modes of operation. In addition, the picture quality that can be reproduced using intra-coded D-frames alone is relatively poor compared to pictures which can be reproduced from I-frames, for example.
Further, the MPEG report suggests that the MPEG standard can be used to support fast play if I-frames are appropriately spaced in a sequence. As an example, the MPEG report states that if I-frames were spaced regularly every ten frames, then a decoder might be able to play the sequence at ten times the normal speed by decoding and displaying only I-frames.
While suggesting the above use of I-frames for fast play, the MPEG report recognizes that this concept places considerable burdens on the media and the decoder. To use I-frames as suggested, the media must be capable of speeding up and delivering ten times the data rate and the decoder must be capable of accepting this higher data rate and decoding the I-frames. While the MPEG report recognizes these problems, it fails to teach how to overcome these burdens on the media and decoder so that a VTR can actually be implemented using the suggested approach.
The MPEG report further suggests that the media itself might be used to somehow sort out the I-frames and transmit them to produce a valid MPEG video bitstream during fast play. However, the MPEG report does not suggest how the media might actually implement such a system.
In addition to the problems encountered during fast play, there are several problems associated with reverse play by a VTR which stores information in accordance the MPEG standard or other highly compressed data formats. For a VTR to decode an inter-frame encoded bitstream and play in reverse, the VTR's decoder must decode each group of pictures in the forward direction, store the decoded pictures, then display them in reverse order. This places severe storage requirements on the decoder and further complicates the problem of gaining access to the coded bitstream in the correct order. Furthermore, similar problems to the ones discussed above in regard to fast play arise if reverse playback is to be performed at different speeds.
Accordingly, there are several problems which need to be addressed when the MPEG or similar standards are used for recording video information on a tape by a VTR.
One known VTR which supports high speed playback receives an analog video signal, digitizes the signal, and converts each picture frame in the signal into main information (for rough formation of the whole image during high speed playback) and subinformation (for forming details of the image). The main information and subinformation corresponding to each picture frame are recorded on a single track with each track on a tape storing data corresponding to a different picture frame. Each block of main information, corresponding to a particular frame, is recorded at the center of the recording track which contains all the data corresponding to the particular frame. The subinformation corresponding to the particular frame is recorded on regions on both sides of the center of the track containing the main information belonging to the particular frame. During trick play, the main information is used to generate images which are displayed.
The known VTR does not receive data in a compressed format and, no make its conversion to main and subinformation, requires that the received analog video signal be digitized and encoded before the data can be recorded on nape. Furthermore, the encoding and one frame per track recording processes used support only intra-frame encoding of pictures. Such a system has serious drawbacks where the picture information for an intra-coded frame of video, such as in the case of HDTV, may not be able to be stored in a single tape track because of the large amount of data involved. Furthermore, such a system fails to take advantage or address the use of inter-frame coding techniques to reduce the amount of data which must be stored for a series of frames.