An uncompressed digital video signal represents an enormous amount of digital data. In general, a video signal is a succession of frames, or pictures, and the number of frames per second is called the frame rate. Each frame is a two-dimensional array of picture elements, or pixels, and the size of this array determines the frame's width and height. In a digital video signal, each pixel is represented by a number of bits that defines the pixel's color, brightness, etc.
Standard definition (SD) television (TV) may be broadcast as pictures of 640 pixels×480 lines, or 4:3 aspect ratio, that vertically refresh at 24, 30, or 60 frames per second. High definition (HD) TV pictures include much more data than SD TV pictures. An HD TV picture has a 16:9 aspect ratio and may include, for example, either 1920 pixels×1080 lines that are interlacedly scanned or 1280 pixels×720 lines that are progressively scanned. Each video frame in an interlaced system consists of two fields that are transmitted at twice the frame rate.
The aggregate data rate of a digital video signal is simply the number of bits per pixel times the number of pixels per frame times the frame rate. For example, an HD digital video signal may have 20 bits/pixel, a size of 1920×1080 pixels, and a frame rate of 30 frames per second. The aggregate data rate of such an HD digital video signal is 1,244,160,000 bits per second (about 1.2 Gigabits/s (Gbps)), which is high enough that it can be difficult to handle today. To put this into further perspective, a 2-hour movie at this data rate is 8640 Gigabits, or 1080 Gigabytes (GB). One of today's common digital video disks (DVDs) can store 4.7 GB of data, and thus 229 such DVDs would be needed to store one 2-hour uncompressed HD movie.
Simply because of a digital video signal's high data rate and enormous size, nearly all current digital video applications employ compression, or encoding, to minimize the space and bandwidth required to store and transmit video. Before compressed video can be viewed or displayed, the compressed video must be decompressed, or decoded, back to a raw form. There exist video compression techniques or formats that are lossless, which means that bit for bit the original pixel data can be reconstructed from the compressed form. Nevertheless, lossless compression formats are not nearly as effective at reducing a video signal's data rate and size as are lossy compression techniques. With lossy compression, there is no guarantee that bit for bit the original pixel data can be reconstructed from the compressed form.
Lossy encoding algorithms include MPEG-2, which is standardized by the Moving Pictures Experts Group (MPEG) that is officially designated ISO/IEC JTC1/SC29 WG11 as ISO standard 13818. MPEG-2 is used for digital SD TV signals and HD TV signals. More advanced algorithms include MPEG-4 part 10 and the advanced video codec (AVC) of MPEG-4, Windows Media 9 (WM9) that is promulgated by Microsoft Corp., and WM9 as adapted and standardized by the Society of Motion Picture and Television Engineers (SMPTE), currently identified as VC9 or VC1.
A video image input to the MPEG-2 algorithm is separated into a luminance (Y) channel that represents brightness information in the image and two chrominance (U, V) channels that represent color information in the image. An input image is also divided into “macroblocks”, with each macroblock comprising four 8 pixel×8 pixel luminance blocks and, depending on the image's chrominance format, a number of 8 pixel×8 pixel chrominance blocks. For example, a macroblock may include six blocks: four luminance blocks for the Y channel and one chrominance block for each of the U and V channels. An 8×8 discrete cosine transform (DCT) is applied to each macroblock. The resulting DCT coefficients are then quantized, re-ordered to increase the occurrence of long runs of zeroes, and run-length coded. Run-length coding compresses the image by storing runs of data (i.e., sequences of the same data value) as single data values and associated counts. The result is then Huffman-coded.
The bitstream generated by an MPEG-2 encoder is organized into frames that are intra coded (I-pictures), forward predictive coded (P-pictures), or bidirectional predictive coded (B-pictures). I-pictures in an MPEG-2 bitstream result from encoding actual input images. P- and B-pictures result from motion-compensating input images before encoding. Motion compensation involves correlating an input image with the previous image, for P- and B-pictures, and with the next image, for B-pictures. Thus, each macroblock in a P- or B-picture is associated with an area in the next and/or previous image that is well-correlated with it. A “motion vector” that maps the macroblock to its correlated area is encoded, and then the difference between the two areas is encoded. It will be appreciated that adjacent frames in a video stream can be well correlated, and so P-pictures may have 90% less data than I-pictures and B-pictures may have 98% less data than I-pictures. On the other hand, an encoder needs significantly more time to encode B-pictures than it does to encode I-pictures. This sort of processing is typical of many compression algorithms besides MPEG-2.
Compression may be done in one pass or multiple passes over the video signal. During typical one-pass compression, the compressor, or encoder, examines the video signal once on a frame-by-frame basis and decides how best to compress each frame based on what it learns about each frame. The video for live broadcast TV applications is one-pass compression. A common form of multi-pass compression is two-pass compression, in which during the first pass, a compressor examines the video on a frame-by-frame basis and gathers information about each frame. After the first pass, the compressor can use the information gathered about all frames to decide how best to compress each frame.
While two-pass compression can typically generate higher quality video than one-pass compression, it usually does not generate the highest quality possible. Particularly where the compressed video asset will have a long shelf life, such as DVDs of popular movies, it is often the desire of the video content owner to obtain the highest possible visual quality for a given aggregate data rate. Nearly all video compression processes have constraints that regulate the aggregate data rates of their compressed signals. The successful use of lossy video compression usually involves simultaneously minimizing the aggregate data rate of the compressed signal, or at least selectively controlling the aggregate data rate to allocate more bits to selected scenes, and maximizing the visual quality of the decompressed result, which is to say, minimizing perceivable differences between original and decompressed video frames.
Various devices and methods have been developed that permit or facilitate the manipulation of streams of data, such as digital video. One example is described in U.S. Patent Application Publication No. US 2004/0240844 by Ostermann et al., which states that it describes a method for editing a data stream on a DVD stream recorder that includes parsing stream object files and identifying logical sections based on the parsing.