Higher-speed transmission networks have allowed for transmission of large files such as video segments. These video segments are first compressed before transmission to reduce bandwidth requirements. Consequently, video transmission and compression is a fast-growing area of development.
Video data can greatly enhance the quality of a computing or communication experience. The consumer use of the Internet took off once graphics was linked to earlier text-based web pages. Portable consumer devices such as cell phones and personal digital assistant (PDA's) are being equipped with small cameras to allow for capture of still or even video pictures. Televisions shows are being sent over cell phone networks to mobile viewers. Efficient transmission of captured images and video segments over limited-bandwidth links requires some sort of compression of the images.
A number of video-compression techniques are known. Compression standards, such as those developed by the motion-picture-experts group (MPEG), have been widely adopted. These compression techniques are lossy techniques, since some of the picture information is discarded to increase the compression ratio. However, compression ratios of 99% or more have been achieved with minimal noticeable picture degradation.
Next-generation compression standards have been developed for transmitting video over wireless networks. The MPEG-4 standard provides a robust compression technique for transmission over wireless networks. Recovery can occur when parts of the MPEG-4 bit stream is corrupted. Enhancements to the MPEG standard beyond MPEG-4 continue to be made.
These MPEG standards ultimately break the image up into small 16×16 pixel macroblocks or even smaller 8×8 or 4×4 pixel blocks. Each block can then be compressed more or less independently of other blocks, and movement of blocks can be described as highly compressed “motion vectors” rather than large bitmaps of pixels.
FIG. 1 shows an image frame divided into rows and columns of blocks. The MPEG standard uses a divide-and-conquer technique in which the video sequence is divided into individual image frames known as video object planes (VOPs), and each frame is divided into rows and columns of macroblocks. Each macroblock is a rectangle of 16 by 16 pixels.
Various window sizes and image resolutions can be supported by MPEG standards. For example, an image frame may have 352 by 288 pixels. The image frame is divided into 18 rows of 16×16 blocks, with each row having 22 blocks each of 16×16 pixels. A total of 396 blocks are contained in each frame.
The blocks are arranged in a predetermined order, starting in the upper left with the first block (BLK #0). The second block, BLK #1, is to the right of BLK #0 in the first row, followed by blocks #2 to BLK #21 in the first row. The second row contains BLK #22 to BLK #43. The last row contains BLK #374 to BLK #395. Of course, other image sizes and formats can have the blocks in rows of various lengths, and various numbers of rows.
When an image frame is encoded, each block is encoded in macroblock-order, starting with the first macroblock of BLK #0 in the first row, and continuing on until BLK #395.
The blocks are arranged in the bit stream into one or more video packets (VP) with a header. In this example Y values of pixels are shown.
FIG. 2 highlights video compression using a motion vector for a macroblock. When a video stream is compressed prior to transmission, each frame or video object plane (VOP) of the video stream is divided into rectangular regions known as macroblocks. Each macroblock is 16 by 16 pixels in size, so a 160×160 frame has 100 macroblocks.
While some macroblocks in some frames may be encoded simply by transmitting the 256 pixels in each macroblock, or by some other encoding, compression occurs when the same image in a macroblock can be found in 2 or more frames. Since video typically has 2 or more frames per second, movement of image objects is usually slow enough that similar images or macroblocks can be found in several successive frames, although with some movement or change. Rather than re-transmit all 256 pixels in a macroblock, only the changed pixels in the macroblock can be transmitted, along with a motion vector that indicates the movement of the macroblock from frame to frame. The amount of data in the bitstream is reduced since most of the macroblock's pixels are not re-transmitted for each frame.
In FIG. 2, macroblock 16′ is a 16×16 pixel region of a first video object plane 10. All 256 pixels in macroblock 16′ are transmitted in the bitstream for first video object plane 10. In next video object plane 12, the same image as in macroblock 16′ appears, but in a different position in the frame. The same image in macroblock 16 in video object plane 12 is offset from the original location of macroblock 16′ in first video object plane 10. The amount and direction of the offset is known as motion vector 20.
Rather than transmit all 256 pixels in macroblock 16, motion vector 20 is encoded into the bitstream. Since one vector replaces up to 256 pixels, a significant amount of data compression occurs. The same image in macroblock 16 may also be found in successive video object planes, and motion vectors can be encoded for these video object planes, further increasing compression.
During compression, a search can be made of all pixels in first VOP 10 within a certain range of the position of macroblock 16. The closest match in first video object plane 10 is selected as macroblock 16′ and the difference in location is calculated as motion vector 20. When the image in macroblock 16 differs somewhat from the original image in original macroblock 16′, the differences can be encoded and transmitted, allowing macroblock 16 to be generated from original macroblock 16′.
The receiver that receives the encoded bitstream performs decoding rather than encoding. Motion vectors and error terms for each macroblock are extracted from the bitstream and used to move and adjust macroblocks from earlier video object planes in the bitstream. This decoding process is known as motion compensation since the movement of macroblocks is compensated for.
FIG. 3 shows that each macroblock can be divided into 4 smaller blocks. The MPEG-4 standard allows for a finer resolution of motion compensation. A 16×16 macroblock 16 can be further divided into 4 blocks 22, 23, 24, 25. Each block 22, 23, 24, has 8×8, or 64 pixels, which is one-quarter the size of macroblock 16.
FIG. 4 shows that separate motion vectors can be encoded for each of the 4 blocks in a macroblock. When the image in a macroblock remains intact, a single motion vector may be encoded for the entire macroblock. However, when the image itself changes, smaller size blocks can often better match the parts of the image.
A macroblock 16 contains four smaller images in blocks 22, 23, 24, 25. In current video object plane 12, these images occur within a single macroblock 16. However, in the previous or first video object plane 10, these images were separated and have moved by different amounts, so that the images merge together toward one another and now all fit within a single 16×16 pixel area of second video object plane 12. The images of blocks 22, 23, 24, 25 have become less fragmented in second video object plane 12.
During encoding, four motion vectors 26, 27, 28, 29 are separately generated for each of blocks 22, 23, 24, 25 respectively. This allows each block to move by a different amount, whereas when only one motion vector is used for all 4 blocks in a macroblock, all blocks must move by the same amount. In this example, block 25′ has shifted more to the left than other blocks 22′, 23′, 24′. Motion vector 29 is slightly larger than the other motion vectors 26, 27, 28.
Better accuracy can be achieved when block-level motion vectors are used with a macroblock, at the expense of more data (four motion vectors instead of one). Of course, not all macroblocks need to be encoded with four motion vectors, and the encoder can decide when to use block-level motion compensation.
Some newer standards or extensions may allow a 16×16 macroblock to be divided into as many as 16 4×4 blocks. A single macroblock may be encoded as one to sixteen motion vectors. In order to determine the best encoding for a macroblock, an encoder may need to try all possibilities, requiring searches for a total of 21 or more motion vectors for all the 16×16, 8×8, and 4×4 blocks. Then the best combination of motion vectors for encoding the macroblock can be selected.
However, each of the 21 motion vectors needs to be found before the best combination can be determined. Each of the 21 motion vectors requires a search over an area of the frame to locate the best-matching pixels. Each possible search location may require complex calculations such as a sum-of-absolute difference (SAD) of pixel value, and these SAD values have to be compared for each possible location searched. The location with the lowest SAD identifies the best-matching motion vector for that sub-block.
An enormous number of calculations can be required for a good search during motion estimation. Each SAD may require many calculations: subtracting one pixel in the current frame's block from the corresponding pixel in the reference frame's block after translation by the search location (proposed motion vector), taking the absolute value of the difference, then repeating for all pixels in the block, and finally summing all absolute differences to get the SAD. Parallel processing may be used, but ordinary algorithms for motion estimation may not be optimal for use on parallel processors.
What is desired is a parallel processing system that performs motion estimation. A parallel motion estimation procedure is also desired that efficiently searches for and evaluates both macroblock and sub-block motion vectors.