The invention relates to coded signals that represent groups of pictures using fewer bits than conventional picture signals and, in particular, to a transcoding method and transcoder that transcodes a predictively-coded object-based picture signal representing a group of pictures to a predictively-coded block-based picture signal representing the group of pictures to allow a conventional block-based picture signal decoder to decode the predictively-coded object-based picture signal.
Communication using picture signals that electronically represent moving pictures is becoming ubiquitous, together with the use of signal coding to increase the efficiency with which such signals can be transmitted and stored. Signal coding is crucial to overcome the many limitations that exist on transmission bandwidth and storage capacity. Most of the popular and successful picture signal coding techniques, such as those known as MPEG-1, MPEG-2, ITU H.261 and ITU H.263, code the original picture signal by subjecting it to block-based processing. In block-based processing, each picture in a group of pictures constituting at least part of a moving picture is expressed as an array of picture elements (pixels), e.g., an array of 640xc3x97480 pixels, each of which has a pixel value. The pixel values for the picture collectively constitute a frame of the picture signal. Each picture is divided into regularly-sized and located square or rectangular blocks of pixels. Processing, such as a block discrete cosine transform (block-DCT), is then individually applied to each block of pixel values constituting the picture to code the picture signal representing the picture. The picture is divided into blocks regardless of the sizes and shapes of the objects represented by the picture.
Although picture signals representing moving pictures can be and are coded simply by applying block DCTs to the blocks of pixel values constituting the frame of the picture signal representing each picture, the coding efficiency is substantially increased by eliminating the substantial temporal redundancy that exists in such picture signals. In such coding schemes as MPEG-1 and MPEG-2, the temporal redundancy is substantially reduced by applying predictive coding with motion compensation. As a result of such coding, the picture signal represents only the differences between the current picture and its reference picture, i.e., the picture or pictures that form the basis for predictively coding the current picture. A picture signal that represents a moving picture and that has been predictively coded using motion compensation is called a predictively-coded picture signal in this disclosure on the understanding that such a picture signal has additionally been subject to spatial coding. In conventional block-based coding schemes such as MPEG-1 and MPEG-2, block-based motion estimation and block-based motion compensation are used. A picture signal coded in this manner will be called a predictively-coded block-based picture signal.
Recently, techniques have been developed for generating object-based picture signals that represent a picture as a number of objects arranged in a scene. In an object-based picture signal, a picture, which may be a single still picture, or one of a group of sequential still pictures constituting a moving picture, is decomposed into objects having arbitrary shapes, unlike the regularly-sized and located blocks of current block-based representations. Each object is represented by a portion of the picture signal.
Techniques have also been proposed for coding such object-based picture signals, the foremost example of which is that embodied in the recent MPEG-4 standard. In a coded object-based picture signal, spatial coding is applied to each signal portion representing an object. When the object-based picture signal represents a moving picture, each signal portion representing an object is additionally predictively coded using, for example, object-based motion estimation and object-based motion compensation to increase the coding efficiency.
Decomposing the picture into signal portions representing arbitrarily-shaped, movable objects provides a more natural decomposition of the picture signal that enables a number of new or enhanced functionalities, such as user interaction with the objects in the picture, greater content-creation flexibility, and potentially improved coding efficiency and fidelity. The advantages of representing pictures using object-based picture signals are especially likely to appeal to content creators.
Object-based picture signals require object-based coding techniques such as MPEG-4 to code, manipulate, and distribute them. However, an object-based decoder, such as an MPEG-4 decoder, that is required to decode a coded object-based picture signal, is inherently more complex than conventional block-based MPEG-1 or MPEG-2 decoders. Moreover, the spread of DVD, Digital TV and HDTV has put MPEG-2 decoders into widespread use. Therefore, for users who already have an MPEG-1 or -2 decoder, and who do not want or cannot afford the additional functionalities offered by an object-based picture signal, the need arises to transcode the MPEG-4 object-based picture signal to an MPEG-1 or -2 block-based picture signal. Moreover, while program content may be developed using object-based picture signals, it may be desirable to distribute the object-based content to people who only have conventional block-based decoders, such as the MPEG-1 or -2 decoders used in DVD, satellite and terrestrial digital television. Consequently, a need exists to be able to transcode predictively-coded object-based picture signals to predictively-coded block-based picture signals that are compatible with the standard decoders of such predictive, block-based coding techniques as MPEG-1, MPEG-2, H.261 and H.263.
FIG. 1 is a block diagram of a conventional transcoder 10 capable of transcoding an MPEG-4 or other predictively-coded object-based picture signal to an MPEG-2 or other predictively-coded block-based picture signal. The system is composed of the MPEG-4 decoder 12 and the MPEG-2 encoder 14. The output 18 of the MPEG-4 decoder is connected to the input 20 of the MPEG-2 encoder. The output 22 of the MPEG-2 decoder provides a predictively-coded block-based picture signal that is compliant with the standard MPEG-2 decoder.
The input 16 of the MPEG-4 decoder receives a predictively-coded object-based picture signal that is compliant with the MPEG-4 standard decoder. The MPEG-4 decoder decodes the predictively-coded object-based is picture signal to generate a conventional picture signal, which it feeds to its output 18. The conventional picture signal may be a set of RGB signals, a set of YIQ or YUV signals or some other suitable form of conventional picture signal.
The MPEG-2 encoder receives the conventional picture signal at its input 20 and applies conventional block-based spatial and temporal coding thereto. The MPEG-2 encoder delivers a predictively-coded block-based picture signal that is compliant with the MPEG-2 standard decoder to its output 22.
The conventional transcoder 10, although simple in concept, is complex in execution. The spatial and temporal coding processing performed by the MPEG-2 encoder is complex and requires substantial computational resources to perform in real time. The demand for computational resources is particularly severe because the MPEG-2 encoder performs motion estimation from scratch. Furthermore, the decoding and subsequent encoding performed by the transcoder 10 often degrades the quality of the picture.
An alternative approach is to perform the transcoding in the coded domain. This would eliminate the need to perform at least part of the re-encoding. Transcoding in the coded domain has the potential to reduce significantly the processing complexity, and also to eliminate partially or completely the generation loss suffered by conventional transcoding.
Some approaches to transcoding block-based picture signals in the coded domain are described by S. F. Chang and D. Messerschmitt in Manipulation and Compositing of MC-DCT Compressed Video, 13 IEEE J. on Selected Areas in Communications (January 1995); B. Natarajan and B. Vasudev in A Fast Approximate Algorithm for Scaling Down Digital Images in the DCT Domain, PROC. IEEE Intl. Conf. on Image Processing (Washington D.C.) (October 1995); N. Merhav and B. Vasudev, Fast Algorithms for DCT-Domain Image Down Sampling and for Inverse Motion Compensation, 7 IEEE Trans. on Circuits and Systems for Video Technology, 468-475 (June 1997); B. Shen and I. Ishwar in Block-based Manipulations on Transform-Compressed Images and Videos, 6 Multimedia Systems (March 1998); S. Wee and B. Vasudev in Splicing MPEG Video Streams in the Compressed Domain, PROC. IEEE Intl. Conf. on Multimedia Signal Processing (Princeton, N.J.) (June 1997).
However, none of the above-cited references describes a transcoder for transcoding a predictively-coded object-based picture signal to a predictively-coded block-based picture signal and that operates in the coded domain. What is needed, therefore, is transcoder and transcoding method that operate in the coded domain and are capable of transcoding a predictively-coded object-based picture signal into a corresponding predictively-coded block-based picture signal. What is also needed is such a transcoder and transcoding method that operate in real-time or in real time with a delay of several frames. Finally, what is needed is such a transcoder and transcoding method that have modest and affordable hardware requirements.
The invention provides a method for transcoding a predictively-coded object-based picture signal representing a group of pictures to a predictively-coded block-based picture signal representing the group of pictures. In the method, a coded scene descriptor and coded object descriptors are extracted from the predictively-coded object-based picture signal and the coded scene descriptor is decoded to generate a scene descriptor. The coded object descriptors are partially decoded to generate respective partially-decoded object descriptors. The partial decoding extracts coding information that describes the coding of the coded object descriptors. In response to the scene descriptor, a frame of a partially-encoded block-based picture signal representing one of the pictures as a current picture is generated from the partially-decoded object descriptors. Finally, a frame of the predictively-coded block-based picture signal representing the current picture is generated by predictively coding the partially-coded block-based picture signal to a uniform coding state in response to the coding information.
The invention also provides a transcoder for transcoding a predictively-coded object-based picture signal representing a group of pictures to a predictively-coded block-based picture signal representing the group of pictures. The transcoder comprises a partial decoder, a block-based picture signal generator, and a partial encoder. The partial decoder includes a demultiplexer, a scene descriptor decoder and an object descriptor decoder. The demultiplexer extracts a coded scene descriptor and coded object descriptors from the predictively-coded object-based picture signal. The scene descriptor decoder decodes the coded scene descriptor to generate a scene descriptor. The object descriptor decoder partially decodes the coded object descriptors to generate respective partially-decoded object descriptors and extracts coding information that describes the coding of the coded object descriptors. The block-based picture signal generator operates in response to the scene descriptor to generate from the partially-decoded object descriptors a frame of a partially-encoded block-based picture signal representing one of the pictures as a current picture. The partial encoder is configured to generate a frame of the predictively-coded block-based picture signal representing the current picture by predictively coding, in response to at least part of the coding information, the partially-coded block-based picture signal to a uniform coding state.
Finally, the invention provides a computer-readable medium in which is fixed a computer program that instructs a computer to perform the above-described transcoding method.
The transcoder and transcoding method according to the invention operate in the coded domain and transcode a predictively-coded object-based picture signal into a corresponding predictively-coded block-based picture signal. Operating in the coded domain saves substantial processing resources since considerable amounts of decoding processing and encoding processing are not performed compared with the conventional approach. Moreover, the transcoder and transcoding method according to the invention use coding information extracted from the predictively-coded object-based picture signal to apply predictive coding to the block-based picture signal. This saves additional processing since the need to perform resource-intensive motion estimation is eliminated for all but a few blocks. Thus, the transcoder and transcoding method according to the invention can operate in real-time or near real time and can be implemented using modest and affordable hardware.