Video data signals may be analog or digital. In analog video standards, e.g., NTSC or PAL for television, video sequences are typically divided into frames and raster lines. With reference to FIG. 1A, an analog video frame 102 is shown to facilitate discussion. The analog video frame 102 includes two interleaved fields; the first field includes a first set of raster lines 104 and the second field includes a second set of raster lines 106. Each field includes half the raster lines 104, 106 of a full frame 102. The frame repetition rate is therefore half the field rate.
The analog video signals shown in FIG. 1A may be converted to digital video signals via, for example, analog to digital conversion (ADC). Common applications for digital video signals include, for example, multimedia, video-conferencing, television, magnetic resonance imaging, and remote-sensed imaging. As is well known, digital video signals may be processed via a computer system for display as video images on a suitable video display device.
Color video images typically include three components. Each component may be thought of as being comprised of a two dimensional array of samples, with each sample being a digital representation of the intensity of a component at a point on a raster line. The color of a pixel itself may be expressed in terms of a luminance component and two chrominance components. These components may, as is well known, be readily converted into the familiar red, green, and blue (RGB) primary colors or other sets of primary colors. The sample values at a point in a picture form a picture element (pixel or pel). If all three components use the same sampling grid, each pixel has three samples, one from each component.
The quality of a digital video image is typically a function of its resolution, which may be measured in a number of horizontal and vertical pixels in the image. As the number of pixels increases, the resolution of the image increases. As the resolution of an image increases, unfortunately, the amount of digital video data required to store or transmit the image correspondingly increases.
To reduce the number of digital bits required to store and/or transmit an image, compression is often performed. One of the objectives of video data compression is to maximize picture quality while minimizing the data required per pixel. By way of background, current video compression techniques include two major categories: entropy processes, or information preserving processes, and information losing, or so called "lossy" processes. Entropy processes introduce no errors in the encoding/decoding process so that the original signal may be reconstructed exactly. However, entropy processes tend to have a small compression ratio, i.e., a small reduction in video bit rate. Conversely, "lossy" compression processes tend to introduce errors in the encoding/decoding process but achieve much higher compression ratios. To strike an acceptable compromise between quality and compression ratio, typical digital video compression processes may combine both entropy coding processes and "lossy" coding processes.
A video frame may be compressed in accordance with either an intra compression technique or a nonintra compression technique. Intra compression techniques compress a video frame using information only from that video frame. Contrarily, nonintra compression techniques compress a video frame using information from one or two other video frames displaced in time. Nonintra compression techniques include, for example, predictive coding techniques. Predictive coding techniques are based on previously transmitted and decoded spatial and/or temporal information. Predictors may employ pixels from a current frame or from previous frames. Motion compensated predictive coding is a type of predictive coding which takes into account the frame to frame displacement of moving objects in a sequence.
A variety of standards have emerged in the video industry for digital video compression. Digital Video (DV) is one such compression standard. The DV compression standard is commonly employed for compressing digital video data generated by, for example, a DV camcorder for storage on digital video tape (DV tape). DV includes versions which are designed to interface with specific analog video standards. DV includes an NTSC version (DV-NTSC) and a PAL version (DV-PAL). In addition, DV includes a standard definition version (DVSD), a reduced bit rate version, a high definition version, and a Sony version developed by Sony Corporation.
In the DV format, two basic forms of a discrete cosine transform (DCT) are used to transform picture sample values into frequency domain components, which may then be transmitted with a reduced bandwidth. These include a still DCT process used for a still type sample block and a motion DCT process used for a motion type sample block. As the terms are used herein, a still type sample block is a two dimensional array of samples in which where there is not much difference between interleaved fields of an analog video frame. A motion type sample block is a two dimensional array of samples in which there is a significant difference between interleaved fields of an analog video frame.
To facilitate discussion, a diagram representing a still type DCT based video encoding process for a still type 8.times.8 video sample block 108 is shown in FIG. 1B. The still type 8.times.8 sample block 108 includes 64 samples 110. In the DV format, the still type 8.times.8 sample block 108 is transformed, using a standard DCT form, into an 8.times.8 DCT block 112. The 8.times.8 DCT block 112 has 64 spatial frequency patterns including a DC spatial frequency pattern 114 and 63 AC spatial frequency patterns 116.
The DC spatial frequency pattern 114 is located in row zero, column zero of the 8.times.8 DCT block 112. The DC spatial frequency pattern 114 has a DC coefficient value and each of the 63 AC spatial frequency patterns 116 has an AC coefficient value. The DC coefficient value of the DCT block 112 is equal to the average of each of the AC coefficient values of the DCT block 112. In the DV format, further encoding steps are typically included for implementing a zigzag based scan 118 of the DCT block 112.
With reference to FIG. 1C, diagram representing a motion type DCT based encoding process for a motion type 8.times.8 video sample block 120 is shown. The motion type 8.times.8 video sample block 120 includes 64 samples 110. The motion type 8.times.8 video sample block 120 is encoded in the DV format by performing two 8.times.4 DCTs; one on a sum of interleaved lines from separate fields of the video frame and one on a difference between interleaved lines from the separate fields of the video frame. The DV format motion type DCT based encoding process yields a first zigzag based 8.times.4 DCT block 122 and a second zigzag based 8.times.4 DCT block 124 when performed on the motion type 8.times.8 video sample block 120. In the DV format, further encoding steps are typically included for implementing a zigzag based scan 123 of the first 8.times.4 DCT block 122 and a zigzag based scan 125 of the second 8.times.4 DCT block 124.
In addition to the above described frequency domain encoding processes, the DV format encoding process typically utilizes a form of entropy coding called run length encoding (RLE). In the DV format, specific RLE codes are derived from a table which is set forth in the DV standard "Blue Book." RLE codes are used to compress a digital video bit stream by taking advantage of repetitive patterns of zeros and ones. The RLE codes, which are of a variable length, are segmented in the DV format within a DV encoded video bit stream. The outer most layer of the DV encoded video bit stream includes DV format video segments.
With reference to FIG. 1D, a DV format video segment 126 is shown. The video segment 126 typically includes 5 compressed macroblocks (CMBs) 128. Each CMB 128 usually includes six DCT blocks 130. Of the six DCT blocks 130 in each CMB 128, four are luminance DCT blocks 131 (Y0, Y1, Y2, and Y3) and two are chrominance DCT blocks 132 (Cr and Co). Therefore, the video segment 126 typically includes 30 DCT blocks 130. In the DV format, each luminance DCT block 131 (Y0, Y1, Y2, and Y3) typically includes 100 bits and each chrominance DCT block 132 (Cr and Co) typically includes 68 bits. The DV format typically uses 4-1-1 sampling.
In the DV format, RLE coded data is typically distributed throughout each video segment according to a unique protocol. With reference still to FIG. 1D, the RLE data is distributed throughout the video segment 126 using three passes. In the first pass, RLE data is stored in allotted areas for each DCT block 130 in a video segment 126. The second pass finds unused areas in each CMB 128 and stores further RLE bits into those areas. The third pass finds any free space in the video segment 126 and stores any remaining RLE bits in that space until the space runs out or until there are no more bits left.
The above described process of distributing RLE coded data in the DV format provides advantages in terms of video picture quality and integrity. With reference still to FIG. 1D, the top CMB 128 represents samples corresponding to a center of a display screen. Since the above described bit packing scheme always starts at the top CMB 128 when looking for extra space, the top CMB 128 receives first priority and therefore is most likely to contain more information than the other four CMBs 128 of a video segment 126. Because the CMB 128 corresponds to the center of the display screen, the center of the display screen is well defined. This feature of the DV format minimizes loss of picture quality in the event that drop out occurs on a DV video tape.
Video data encoded in the DV format, as described above, must be decoded in order to render a video image. One problem in decoding video images, compressed in the DV format, is that a very large amount of processing time is required to fully decode the video data. The amount of processing time required to decode compressed video data is proportional to the resolution of the image represented by the data. The full DV-NTSC format encodes a 720.times.480 image and the full DV-PAL encodes a 720.times.576 image. For full resolution decoding of DV encoded data, a Pentium microprocessor operating at 133 Megahertz (Pentium 133) may process about two frames per second. A real time decoding rate refers to a rate fast enough to satisfy human visual perception of motion. A decoding rate of 20 to 30 frames per second (30 fps) is generally accepted as a satisfactory real time decoding rate.
It is well known to those skilled in the art of video compression that there are typically three decoding subprocesses in the DV format which require a large amount of processing time: (1) performing the inverse discrete cosine transform (IDCT), (2) unpacking the inverse run length encoded (RLE) AC coefficient values from each video segment, and (3) color space conversion. The large amount of processing time required to decode DV-formatted video data renders quality video display all but impossible on some computer systems. For example, typical personal computers which include central processing units such as, for example, the Intel Pentium Processor, require too much to decode full resolution DV-formatted video data.
Regarding the first two sub-processes, to provide a partial solution to the above described processing time problems, the DV format allows for a reduced resolution fast preview decoding option. To accomplish this, the DV format embeds a DC coefficient of each DCT block in an easy-to-read location and format, allowing a DC preview which is effectively one pixel per DCT block. DC preview decoding typically corresponds to a 90.times.60 resolution for DV-NTSC and a 90.times.72 resolution for PAL. As an example of DC preview decoding, a Pentium microprocessor operating at 133 Megahertz (Pentium 133) may require about 3 to 4 milliseconds to process a single video frame. Although a preview based on displaying these DC values provides real time decoding of the image, at 30 frames per second (fps), on a Pentium 133, the DC preview image may still have too low a resolution and sacrifice too much detail to be acceptable for some video applications. Faster decoding of video data in video editing applications in order to provide faster video editing functions. Faster decoding of video data also allows for displaying higher resolution real time video images.
As can be appreciated from the foregoing, full resolution decoding of DV encoded data requires an undesirable large amount of processing time due to the processing intensive requirements of performing IDCTs and unpacking RLE coded data. While DC preview decoding of DV encoded data may be achieved without requiring an undesirable large amount of processing time, DC preview decoding tends to provide low resolution video images. Therefore, what is needed is a reduced-quality resolution DV codec which provides video images having adequate resolution without demanding an undesirable large amount of processing time to decode.