This invention relates generally to digital video signal processing. More particularly, this invention relates to a technique for efficiently scalable digital video decoding of a Motion Compensated-Discrete Cosine Transform (MC-DCT) video signal.
Many video applications utilize data compression. More particularly, many video applications utilize transform code compressed domain formats, which include the Discrete Cosine Transform (DCT) format, the interframe predictive code format, such as the Motion Compensation (MC) algorithm, and hybrid compressed formats. The combination of Motion Compensation and Discrete Cosine Transform (MC-DCT) is used in a number of standards, including: MPEG-1, MPEG-2, MPEG-4, H.261, and H.263. Although the present invention is disclosed in the context of MPEG-1 and MPEG-2 decoders for the purpose of illustration, the present invention is equally applicable to any other MC-DCT scheme.
The MPEG-2 video coder is used in a variety of applications, including: (1) medium resolution video conferencing applications, where the resolution of the video frames is 352xc3x97288 pixels; (2) Standard Definition Digital Television (SDTV) (720xc3x97480/576 pixels); and (3) High Definition Digital Television (HDTV) (1980xc3x971080 pixels). It is not necessary for a single implementation of a digital video decoder to support all the resolutions, formats and bit-rates that are required by all the video applications which use a particular video algorithm. Nevertheless, it would be highly desirable if both hardware and software based implementations could be easily and efficiently scaled so that a wide range of applications could be supported. There are several design goals which should be addressed for this type of scalability to be achieved.
One design goal is efficiency. In particular, it is important to efficiently process shared parameters in a bit stream. All the known video standards have some syntactic elements to allow the bit stream to be split into multiple independent parts. For example, an MPEG-2 video bit stream can use slice headers to identify portions of a video frame which can be independently processed. A limitation on the splitting of the bit stream is that the information in the bit stream is part of the same video frame and hence, a wide range of parameters are shared with the rest of the video frame. If the access pattern to shared parameters is not properly moderated, individual decoders are stalled, resulting in a drop in the efficiency of the overall implementation. Therefore, it is important to efficiently share parameters in a video bit stream that has been split into independent parts.
A second design goal is synchronization. In particular, it is necessary to observe the inherent sequential constraints associated with a bit stream. Even though the syntax of the video bit stream may allow decoding to be split into multiple independent parts, there is still an inherent sequential constraint in the decoding process. For example, if the decoding process is split into multiple independent portions, a monitor program (e.g., an Operating System (OS) or a state machine) needs to ensure that all the individual decoders complete their task before the display processing can be initiated. If the monitor task does not ensure that the individual decoders have completed their processing before starting the display process, an incomplete video frame is displayed at the output. Hence, synchronization is a very critical task in this multi-processing environment. The number of synchronization points in this scenario is also a very critical parameter. As an example, if the decoding process is split into a Bitstream Decoding and Inverse Quantization process (BDIQ), an Inverse Discrete Cosine Transform process (IDCT), and a Motion Compensation and Write Back process (MCWB), the IDCT process cannot start before the BDIQ process ends and the MCBW process cannot start before the IDCT process ends. If synchronization is needed multiple times at the block and the Macroblock (a group of blocks) level, the implementation will not be very efficient. Therefore, a system is needed in which processing improvements are realized, while sequential constraints are observed.
A third design goal is scalability. The decoding process should have the ability to be split into a wide range of independent sub-processes. This puts a constraint on how the decoding process has to be split. For example, typical hardware for MPEG-2 video decoding splits the decoding process into three or more parts: BDIQ, IDCT and MCBW. Splitting the video decoding process into these three processes in a multi-process environment will give no more than a factor of three improvement (ignoring the synchronization effects) in a uni-processor environment. Such an improvement is not sufficient, for example, if one needs to scale an SDTV Video decoder to an HDTV video decoder. Thus, it is important to improve the partition of sub-processes so that a system can be truly scalable.
A fourth design goal is flexibility. It is very advantageous if the decoding process can be dynamically partitioned into individual sub-processes. Such a partition allows a better use of the system resources. For example, in a typical video application, the video decoding process is associated with audio decoding and a system stream demultiplexing process. While these two processes have the same or higher priority than the accompanying video process, their processing requirements are a lot smaller than the video process. Instead of statically dedicating some hardware to do the audio and the system tasks, it is more economical to give these tasks a higher priority so that these tasks are completed in time and the resources they are using can then be used by the video process. Thus, it is highly desirable to provide a system in which bandwidth can be assigned to more computationally intensive tasks.
A fifth design goal is additional functionality. Having the flexibility of making all the video and its accompanying processes work in an asynchronous manner and synchronizing only when needed, facilitates the support of additional functions. For example, by properly isolating the video frame decoding and display processes, the decoding delay and hence the number of video frames that have to be buffered in memory can be more efficiently controlled. Therefore, it is important to provide a system that operates asynchronously when such operation can be exploited to achieve additional functions.
In view of the foregoing, it would be highly desirable to provide a technique for efficiently scalable digital video decoding which facilitates the efficiency, synchronization, scalability, flexibility, and extended functionality goals set forth above.
The invention includes an apparatus for decoding a Motion Compensated-Discrete Cosine Transform (MC-DCT) video stream. The apparatus includes an input port to receive an MC-DCT video stream with an associated hierarchy of data structures including a sequence data structure, a picture data structure, a slice data structure, and a macroblock data structure. A monitor processor splits the MC-DCT video stream into a set of video streams. A set of sub-processors processes the set of video streams. Each sub-processor has an assigned computational task performed on a specified hierarchical level of the associated hierarchy of data structures. Each sub-processor performs the assigned computational task with a designated data structure including all parameter data required at the specified hierarchical level.
In another aspect of the invention, an apparatus includes an input port to receive an MC-DCT video stream with an associated hierarchy of data structures including a sequence data structure, a picture data structure, a slice data structure, and a macroblock data structure. A monitor processor splits the MC-DCT video stream into a set of video streams. A set of sub-processors processes the set of video streams. Each sub-processor has an assigned computational task performed on a specified hierarchical level of the associated hierarchy of data structures. A synchronous processor combines the set of video streams received from the set of sub-processors. The synchronous processor ensures that parameters produced by a sub-processor associated with a lower hierarchical level of the hierarchy of data structures are final prior to combining the parameters with values produced by a sub-processor at a higher hierarchical level of the hierarchy of data structures.
Another embodiment of the invention includes an input port to receive an MC-DCT video stream. A monitor processor splits the MC-DCT video stream into a set of video streams. The monitor processor is configurable to alternately produce a set of video streams in accordance with a first partition according to functional sub-processes or a second partition according to a group-of-macroblocks.
The method of the invention includes the step of receiving an MC-DCT video stream with an associated hierarchy of data structures including a sequence data structure, a picture data structure, a slice data structure, and a macroblock data structure. The MC-DCT video stream is split into a set of video streams. The set of video streams are processed in accordance with an assigned computational task performed on a specified hierarchical level of the associated hierarchy of data structures. Each video stream is processed in accordance with a designated data structure including all parameter data required at the specified hierarchical level.
The invention provides a technique for efficiently scalable digital video decoding. In particular, the technique of the invention facilitates efficiency, synchronization, scalability, flexibility, and extended functionality in a video decoder.
The invention permits decoding of video streams (e.g., MPEG and other video streams with DCT and motion compensation) in a number of contexts. For example, the invention is advantageously exploited in connection with servers and workstations that provide multiprocessing capabilities. In particular, the invention is advantageously exploited in emerging servers and workstations that provide HDTV video decoders using symmetric multiprocessing techniques. The invention is also advantageously exploited in connection with multiprocessing system-on-a-chip architectures that are commercially available at this time, such as the MAJC-5200 from SUN MICROSYSTEMS, INC. and the IBM POWER4 from INTERNATIONAL BUSINESS MACHINES, INC. In addition, the invention can be exploited in connection with specialized ASICs that can use lower frequency, and hence lower power, designs.