Compression of digital video data is used for many applications including transmission over bandwidth-constrained channels, such as direct broadcast satellite, and storage on optical media. In order to achieve very efficient compression, complex, computationally intensive processes are used for encoding and decoding video. For example, although MPEG-2 (Moving Pictures Expert Group) is known as a very efficient method for compressing video, a new, more efficient standard, H.264 (Advanced Video Coding) is being developed.
The AVC standard uses a number of techniques to compress video streams, such as motion-based compensation to reduce temporal redundancy. The AVC standard encodes each frame using three main picture types—intra-coded pictures (I-pictures), inter-coded pictures (P-pictures), and Bi-predictive (B-pictures). I-pictures are coded without reference to other pictures and can provide access points to the coded sequence where decoding can begin. P-pictures are coded more efficiently using motion compensation prediction of each block of sample values from some previously decoded picture selected by the encoder. B-picture uses both forward and backward motion compensated prediction, and both previous and future frames are used as reference frames. B-pictures may be predicted using a weighted average of two blocks of motion-compensated sample values.
The H.264 standard allows for using a large number of reference frames to reconstruct a single picture and using reordering schemes that transmit many “future frames” with a display schedule later than a current picture before the current picture is transmitted. By contrast, MPEG-1 and MPEG-2 allow for at most two reference frames for reconstructing a picture and only a single future frame.
Decoding video often involves processing the video as a stream of pictures, each of which may be a field or a frame (typically consisting of two interleaved fields). Each field or frame further includes a number of slices of macroblocks (MBs), wherein a slice is a sequence of macroblocks, and the slice has a flexible size. In case of multiple slice groups, the allocation of the macroblocks is determined by a macroblock to slice group map that indicates which slice group that each macroblock belongs to. The video sequence is read blockwise; and an interface is offered for bitwise stream reading as well as parsing of common syntax elements, such as Exp-Golomb codes and static code tables.
In various video encoding/decoding standards, the video encodings are organized in accordance with certain syntactical rules, which may also be referred to as the syntax elements. In a video codec, such as H.264/AVC, the syntax elements at and below the slice layer are adaptively coded. The syntax elements include higher-layer syntax elements for video sequence, picture, and slice headers, slice payload data, reference frame indexes, and so forth.
FIG. 1 is a flowchart illustrating a conventional process for decoding picture stream. The conventional decoding process only decodes the picture stream serially, and the picture stream may include one or more slices of pictures. Referring to FIG. 1, the process 100 is carried out by a video decoder. In step 110, a video stream is received by the video decoder either from a network or from an external storage device. In step 120, the picture header and slice header are read to obtain information of the target slice, and the syntax elements of one slice of a picture are read.
In step 130, the decoder is initialized to decode the slice of the picture; following by step 140, it is determined whether all of the slices of the picture have been decoded. If so, the decoded data of the picture is outputted (step 150). Otherwise, the process 100 returns to step 130 for decoding the next slice of the picture. In step 160, the end of the picture stream is detected. If not, the process returns back to step 130 for decoding one slice of the picture. Finally, if all the pictures in the stream are decoded, the process 100 is completed.
FIG. 2 is a schematic diagram illustrating a conventional video decoding process. In detail, the decoding process of each picture of a video further contains several essential procedures, such as entropy decoding, inverse quantization (IQ), inverse transform, which can be in the form of inverse discrete cosine transform (IDCT) used in MPEG-1, MPEG-2 and MPEG-4 or Hadamard-like integer transform used in H.264, and motion compensation (MC). Referring to FIG. 2, an entropy decoding 210 process is executed, when a bitstream of a video is inputted. In the entropy decoding 210 process, the input bits are parsed into syntax elements by referring to code tables or Exp-Golomb codes due to the codec type. The syntax elements include information of a picture or a slice and motion vectors, wherein the aforementioned information is used to determine the picture type while the motion vectors are adopted for motion compensation.
After the entropy decoding process, each macroblock in the bitstream is processed through inverse quantization (IQ) 220 and inverse transform 230, and the macroblock is transformed into pixel values in spatial domain. For a reference picture (I picture), the result of transformation is optionally—and for the case of H.264 only—added to the prediction information that comes from a directional index (0-8) that is part of the Intra-MB information. This index corresponds to 8 possible prediction directions and 1 average (also known as DC) mode that form pixel prediction for the current block from neighboring pixel values. A duplicate of the pixel data is stored in a frame buffer 250 as the reference of motion compensation for the subsequent predictive pictures.
For a predictive picture (P picture or B picture), the motion vectors obtained by the entropy decoding 210 process are used to search for the corresponding reference picture. The predictive differences transformed by IQ 220 and inverse transform 230 are added to the reference picture to compose the predictive picture. Similar to reference (I) picture, the decoded pixel values of the predictive picture is outputted, and its duplicate is also sent to the frame buffer 250 for storage.
According to the forgoing description, the video decoding may be predictive and need to make forward or backward references to other pictures. However, conventional video decoders are adapted to decode slices of a video serially, so decoding efficiency in conventional video processing systems is substantially not efficient when executed on computing systems with parallel computational ability.
With the rollout of multi-threaded processor, decoding of video slices in parallel can be implemented and executed in multiple threads. The present invention may be employed to significantly improve the efficiency of decoding process by a combination of the multi-threaded processor with innovative software solutions.
Further, limitations of conventional approaches will become apparent to one of skill in the art, through comparison of such method with some embodiments of the present invention as set forth in the remainder of the present application with reference to the drawings.