As the quality and resolution of a video stream increases, the demands placed on the video decoder to produce a high-quality output from the compressed video stream also increases. A compressed video stream includes a series of video images, generally referred to as video frames. Patterns corresponding to objects and background tend to “move” within the video frames to form corresponding objects or background from one video frame to the next. An object in the current frame may generally correspond to the same object in a reference frame, but may be in a different location.
In video codecs, each video frame is commonly divided into blocks or macroblocks. The size of a macroblock is typically 16×16 pixels, but can be any size, for example, down to 4×4 pixels, according to various standards. Such standards can include, for example, moving picture expert group (MPEG) MPEG-1, MPEG-2, and MPEG-4H.264/MPEG-4 advanced video coding (AVC) (hereinafter referred to as H.264, the standard of which is expressly incorporated by reference herein).
In the encoding process, macroblocks (or smaller blocks within each macroblock) in the current frame are compared to regions in previous frames to locate the best matching macroblock. In other words, video encoders use motion estimation to search one or more previous reference frames to find the area that best matches the currently being encoded macroblock of the current frame. Video decoders carry out the same process in reverse order. Motion estimation is just one of many techniques used in optimizing the encoding and decoding of video frames.
A video decoder is designed with a target number of clocks per macroblock. Each stage in a decoder pipeline is designed to process each macroblock within this target number of clocks. Conventionally, the decoder outputs one decoded macroblock every target number of clocks. This is referred to as the throughput of the decoder.
The target number of clocks per macroblock and the operating frequency of the decoder together determine the maximum performance that the decoder can achieve, or otherwise deliver. The operating frequency divided by the target number of clocks yields the number of macroblocks the decoder can process in one second. For example, a 1080 p video stream, i.e., having 1080 progressive horizontal scan lines and 1920 pixels per horizontal scan line, requires around 486,000 macroblocks to be processed per second at a frame rate of 60 Hz, a significant number of macroblocks to process for a decoder.
To achieve higher performance from a decoder, two approaches are conventionally attempted. First, the operating frequency of the decoder can be increased. The number of macroblocks a decoder can process per second is directly proportional to the operating frequency. Increasing the operating frequency allows the decoder to process more macroblocks per second, and hence, a higher frame resolution and frame rate can be supported.
The operating frequency, however, cannot be increased arbitrarily. At a specific process node, a decoder design can only be over-clocked to an extent that the design allows without breaking timing rules or specifications. As the operating frequency is increased, the clock period reduces, which makes meeting the timing specifications increasingly difficult. If the operating frequency is increased and the clock period becomes less than the length of the critical path in the design, timing rules or specifications will be violated and the decoder design will fail. This problem can be slightly mitigated by moving the decoder design to a smaller transistor geometry. But such approach is limited by the state of the current fabrication technology, which advances according to the well-known Moore's law.
Another conventional approach for achieving higher performance from a video decoder is to decrease the target number of clocks per macroblock. The performance of a video decoder is inversely proportional to the target number of clocks per macroblock. Reducing this number causes more macroblocks to be processed per second, and hence, a higher resolution and frame rate can be supported. Many hurdles present themselves with this approach. For example, a video decoder design is specifically architected for some specific target number of clocks per macroblock. Changing the number of clocks per macroblock at least requires a major overhaul of the architecture and design of the video decoder, if not a total re-design.
Depending on the state of the video and broadcasting industry, the turnaround time for such activities may well be large enough to make the new design of the video decoder obsolete, and trigger yet another overhaul for even higher resolutions and higher frame rates. This approach may cause a video decoder design house to constantly play catch-up with the industry without being able to make a high-performance decoder sufficient to meet market demands.
Embodiments of the invention address these and other limitations in the prior art.