Standard definition (SD) television (TV) images may be broadcast as images of 640 pixels×480 lines, or 4:3 aspect ratio, that vertically refresh at 24, 30, or 60 images, or frames, per second. HD TV images include much more data than SD TV images. An HD TV image has a 16:9 aspect ratio and may include, for example, either 1920 pixels×1080 lines that are interlacedly scanned or 1280 pixels×720 lines that are progressively scanned. Each video frame, or image, in an interlaced system consists of two fields that are transmitted at twice the frame rate.
Encoding algorithms, such as MPEG-2, have been developed for SD video and audio signals and have been standardized by the Moving Pictures Experts Group (MPEG), which is officially designated ISO/IEC JTC1/SC29 WG11. MPEG-2, for example, is published as ISO standard 13818 and is used for digital SD TV signals and HD TV signals.
Information to be encoded with the MPEG-2 algorithm may be an analog video sequence of frames that have a pre-set pixel resolution at a pre-set frame rate, such as 29.97 frames/second with audio. The resulting MPEG-2 bitstream is a series of data frames that are encoded, e.g., compressed, versions of respective input images and sounds.
A video image input to the MPEG-2 algorithm is separated into a luminance (Y) channel that represents brightness information in the image and two chrominance (U, V) channels that represent color information in the image. An input image is also divided into “macroblocks”, with each macroblock comprising four 8-pixel×8-pixel luminance blocks and, depending on the image's chrominance format, a number of 8-pixel×8-pixel chrominance blocks. For example, a macroblock may include six blocks: four luminance blocks for the Y channel and one chrominance block for each of the U and V channels. An 8×8 discrete cosine transform (DCT) is applied to each macroblock. The resulting DCT coefficients are then quantized, re-ordered to increase the occurrence of long runs of zeroes, and run-length coded. Run-length coding compresses the image by storing runs of data (i.e., sequences of the same data value) as single data values and associated counts. The result is then Huffman-coded.
The bitstream generated by an MPEG-2 encoder is organized into frames that are intra coded (I-pictures), forward predictive coded (P-pictures), or bidirectional predictive coded (B-pictures). I-pictures in an MPEG-2 bitstream result from encoding actual input images. P- and B-pictures result from motion-compensating input images before encoding. Motion compensation involves correlating an input image with the previous image, for P- and B-pictures, and with the next image, for B-pictures. Thus, each macroblock in a P- or B-picture is associated with an area in the next and/or previous image that is well-correlated with it. A “motion vector” that maps the macroblock to its correlated area is encoded, and then the difference between the two areas is encoded. It will be appreciated that adjacent frames in a video stream can be well correlated, and so P-pictures may have 90% less data than I-pictures and B-pictures may have 98% less data than I-pictures. On the other hand, an encoder needs significantly more time to encode B-pictures than it does to encode I-pictures.
The frames in an MPEG-2 bitstream are arranged in a specified order that is called a group of pictures (GOP). The ratio of I-, P-, and B-pictures in the GOP structure is determined by the nature of the input video stream, the bandwidth constraints on the output stream, and the encoding time, which can limit the use of the MPEG-2 algorithm in real-time environments having limited computing resources. Encoding time becomes an even more serious problem when the MPEG-2 and similar algorithms are used for encoding signals, such as HD signals, that have much higher resolution and therefore much higher data rates than SD formats.
Despite these problems, MPEG-2 encoding has been applied to HD video signals. For example, U.S. Patent Application Publication No. 20030174768 states that it describes a system and method for processing an HD TV image, and these involve six programmable encoders that are connected in parallel. According to the Publication, each encoder receives the HD TV at a data rate of 74.25 megahertz (MHz) and processes a respective vertical portion of each HD TV image. The encoders do not communicate with one another, so the portion processed by one encoder overlaps the adjacent portion(s) processed by other encoder(s). This facilitates assembly of complete encoded images but is inefficient at least in that the overlapping portions of each image are encoded twice.
MPEG-2 encoding was developed in 1994, and newer, more advanced, and more computationally complex algorithms have been developed that are substantially more efficient in compressing motion video, i.e., video with relative movement between camera and scene. These advanced algorithms include MPEG-4 part 10 and the advanced video codec (AVC) of MPEG-4, Windows Media 9 (WM9) that is promulgated by Microsoft Corp., and WM9 as adapted and standardized by the Society of Motion Picture and Television Engineers (SMPTE), currently identified as VC9 or VC1. The efficiencies of these advanced algorithms reduce the bandwidth needed for encoded high-resolution video, thereby making high-resolution video less expensive and easier to transmit, store, and manipulate.
In general, these advanced compression formats are based on the idea of encoding SD video using one, two, or four processors. Such processor arrangements are commonly found in today's personal computers (PCs) and server computers. A WM9 encoder, for example, can beneficially use two or four multi-threaded processors that share a memory and that are necessarily synchronized to enable generation of a single output encoded data stream. Like an MPEG-2 encoder, a WM9 encoder generates I-, P-, and B-pictures, but some of WM9's advantages arise from how the images and motion vectors are encoded.
Nevertheless, if WM9, for example, is to be used to encode HD video signals, today's PCs and servers are unable to do the encoding in real time, i.e., fast enough to keep up with the input frame rate. A single processor operating at 3.0 GHz can take up to fifty hours to encode one hour of common HD TV, depending on encoding options and image size.
Current implementation limitations restrict WM9 to use with a maximum of four processors or processing elements (PEs), which may reflect the practical consideration that PC platforms having more than four processors are currently rare at best. Modification, or porting, of WM9 to a multiprocessor array of more than four processors might be done, but although a port to, say, a hypothetical 32-processor PC would be straightforward, performance would drop off radically as contention for the one memory bus would become significant. The increased complexity of advanced encoding algorithms such as WM9 combined with the substantially greater data sizes of high-resolution video images such as HD TV makes it impractical or impossible to encode high-resolution video with an advanced encoding format such as WM9 in real time with commonly available, inexpensive processors.
This inability to process high-resolution video in real time increases the difficulty of transmitting, storing, and manipulating high-resolution video. A complete video file must be stored in an uncompressed form in a capacious storage medium, such as a video tape or one or more hard disk drives, and encoded one frame at a time. This is time-consuming and expensive, precluding the use of advanced encoding algorithms like WM9 in common broadcast-TV applications like sports events that require real-time recording and transmission, and reducing the use in non-real-time applications like non-linear editing (NLE) and authoring of digital video disks (e.g., DVDs).