The invention relates to coded signals that represent pictures using fewer bits than conventional picture signals and, in particular, to a transcoding method and transcoder that transcodes coded object-based picture signals to coded block-based picture signals to allow a conventional block-based picture signal decoder to decode the coded object-based picture signals.
Communication using picture signals that electronically represent still and moving pictures is becoming ubiquitous, together with the use of signal coding to increase the efficiency with which such signals can be transmitted and stored. Signal coding is crucial to overcome the many limitations that exist on transmission bandwidth and storage capacity. Most of the popular and successful conventional picture signal coding techniques, such as those known as JPEG, MPEG-1, MPEG-2, ITU H.261 and ITU H.263, code the original picture signal by subjecting it to block-based processing. In block-based processing, each picture is expressed as an array of picture elements (pixels), e.g., an array of 640xc3x97480 pixels, each of which has a pixel value. The pixel values collectively constitute the picture signal. The picture is divided into regularly-sized and located square or rectangular blocks of pixels. Processing, such as block discrete cosine transforms (block-DCT), block-based motion estimation and block-based motion compensation is then individually applied to the corresponding blocks of pixel values to code the picture signal. The picture is divided into blocks regardless of the sizes and shapes of the objects represented by the picture.
Recently, techniques have been developed for generating object-based picture signals that represent the picture as a number of objects arranged to form a scene. Techniques have also been proposed for coding such object-based picture signals, the foremost example of which is that embodied in the emerging MPEG-4 standard. In an object-based picture signal, a picture, which may be a single still picture, or one of a group of sequential still pictures constituting a moving picture, is decomposed into objects having arbitrary shapes, unlike the regularly-sized and located blocks of current block-based representations. Each object is represented by a portion of the picture signal. This technique provides a more natural decomposition of the picture signal that may enable a number of new functionalities, such as user interaction with the objects in the picture, greater content-creation flexibility, and potentially improved coding efficiency and fidelity. These advantages of representing pictures using object-based picture signals is likely to especially appeal to content creators.
Object-based picture signals require object-based coding techniques such as MPEG-4 to code, manipulate, and distribute them. However, an MPEG-4 decoder, which is required to decode a coded object-based picture signal, is inherently more complex than conventional block-based MPEG-1 or MPEG-2 decoders. Moreover, the spread of DVD, Digital TV and HDTV has put MPEG-2 decoders into widespread use. JPEG still picture decoders are also widely used. Therefore, for users who already have a JPEG or MPEG-2 decoder, and who do not want or cannot afford the additional functionalities offered by an object-based picture signal, the need arises to transcode the MPEG-4 object-based picture signal to an MPEG-2 block-based picture signal. A similar need exists with respect to still pictures. Moreover, while program content may be developed using object-based picture signals, it may be desirable to distribute the object-based content to people who only have conventional block-based decoders, such as the MPEG-2 decoders used in DVD, satellite and terrestrial digital television. Consequently, a need exists to be able to transcode coded object-based picture signals to coded block-based picture signals that are compatible with the standard decoders of such block-based coding techniques as JPEG, MPEG-1, MPEG-2, H.261 and H.263.
FIG. 1 is a block diagram of a conventional transcoder 10 capable of transcoding an MPEG-4 or similar coded object-based picture signal to an MPEG-2 or similar coded block-based picture signal. The system is composed of the MPEG-4 decoder 12 and the MPEG-2 encoder 14. The output 18 of the MPEG-4 decoder is connected to the input 20 of the MPEG-2 encoder. The output 22 of the MPEG-2 decoder provides a coded block-based picture signal that is compliant with the standard MPEG-2 decoder.
The input 16 of the MPEG-4 decoder receives a coded object-based picture signal that is compliant with the MPEG-4 standard decoder. The MPEG-4 decoder decodes the coded object-based picture signal to generate a conventional picture signal, which it feeds to its output 18. The conventional picture signal may be a set of RGB signals, a set of YIQ or YUV signals or some other suitable form of conventional picture signal.
The MPEG-2 encoder receives the conventional picture signal at its input 20 and applies conventional block-based coding thereto. The MPEG-2 encoder delivers a coded block-based picture signal that is compliant with the MPEG-2 standard decoder to its output 22.
The conventional transcoder 10, although simple in concept, is complex in execution. The coding processing performed by the MPEG-2 encoder is complex and requires substantial computational resources to perform in real time. Furthermore, the decoding and subsequent encoding performed by the transcoder 10 often degrades the quality of the picture. An alternative approach is to attempt to perform the transcoding in the coded domain. This would eliminate the need to perform at least part of the re-encoding. Transcoding in the coded domain has the potential to reduce significantly the processing complexity, and also to eliminate partially or completely the generation loss suffered by conventional transcoding.
Some approaches to transcoding conventional block-based picture signals in the coded domain are described by S. F. Chang and D. Messerschmitt in Manipulation and Compositing of MC-DCT Compressed Video, 13 IEEE J. ON SELECTED AREAS IN COMMUNICATIONS (1995 January); B. Natarajan and B. Vasudev in A Fast Approximate Algorithm for Scaling Down Digital Images in the DCT Domain, Proc. IEEE INTL. Conf. on Image Processing (WASHINGTON D.C.) (1995 October); N. Merhav and B. Vasudev, Fast Algorithms for DCT-Domain Image Down Sampling and for Inverse Motion Compensation, 7 IEEE Trans. on Circuits and System for Video Technology, 468-475 (1997 June); B. Shen and I. Ishwar in Block-based Manipulations on Transform-Compressed Images and Videos, 6 MULTIMEDIA SYSTEMS (1998 March); S. Wee and B. Vasudev in Splicing MPEG Video Streams in the Compressed Domain, Proc. IEEE INTL. Conf. on Multimedia Signal Processing (PRINCETON, N.J.) (1997 June).
However, none of the above-cited references describes a coded domain transcoder capable of transcoding a coded object-based picture signal to a coded block-based picture signal. What is needed, therefore, is a coded-domain transcoder capable of transcoding in real-time a coded object-based picture signal representing a still or moving picture into a corresponding coded block-bas based picture signal. What is also needed is such a coded-domain transcoder having modest and affordable hardware requirements.
The invention provides a transcoder for transcoding a coded object-based picture signal that represents a picture to a coded block-based picture signal that also represents the picture. The coded object-based picture signal may be an MPEG-4 picture signal, for example, and the coded block-based picture signal may be an MPEG-2 picture signal, for example. The transcoder comprises a culling module, a picture composer and a partial encoder. The culling module receives the coded object-based picture signal and culls signal portions from the coded object-based picture signal to generate a culled object-based picture signal. The signal portions culled are those that represent objects not visible in the picture. The picture composer receives the culled object-based picture signal, partially decodes selected portions of the culled object-based picture signal and generates from them blocks of a partially-coded block-based picture signal in which the blocks have different coding states. The partial encoder receives the partially-coded block-based picture signal and encodes the blocks of the partially-coded block-based picture signal to generate the coded block-based picture signal in which the blocks have a uniform coding state. The coded block-based picture signal is capable of being decoded by a conventional block-based decoder.
The culling module may include an object culling module that culls, from the coded object-based picture signal, signal portions that represent objects that are not present in the picture and signal portions that represent objects that are present in the picture but are hidden.
The object-based picture signal may include a scene descriptor that describes the arrangement of the objects in the picture and may additionally include a coded shape descriptor for each of the objects. The object culling module may use the scene descriptor to identify the signal portions that represent the objects not present in the picture and may decode the coded shape descriptors of the objects identified as being present in the picture to identify the signal portions that represent the objects that are present in the picture, but are hidden.
The object-based picture signal may additionally include, for each of the objects, an object descriptor comprising a coded amplitude descriptor including interior tiles and boundary tiles. The culling module may additionally include a tile culling module that culls, from the object-based picture signal, signal portions that represent interior tiles and boundary tiles that are hidden in the picture.
The culled object-based picture signal may include a culled amplitude descriptor for each object visible in the picture, and the picture composer may include a tile-oriented picture composition module, a shift, mask and merge module, processing modules and a processing selection module. The culled amplitude descriptor comprises tiles representing portions of the object visible in the picture. The tile-oriented picture composition module receives the culled object-based picture signal and identifies, for each tile of the culled amplitude descriptors, at least one block of the partially-coded block-based picture signal to which the tile contributes. The shift, mask and merge module calculates shift, mask and merge parameters for each tile. The processing modules are each capable of receiving the tile or tiles contributing to each block of the partially-coded block-based picture signal and of decoding the tile or tiles to the extent that allows the block-generating processing defined by the shift, mask and merge parameters to be applied to them. The processing modules are also capable of applying the block-generating processing defined by the respective shift, mask and merge parameters to the tile or tiles to generate the block. The processing modules are each capable of decoding the tile or tiles contributing to the block to a coding state that differs among the processing modules. The processing selection module selects one of the processing modules to generate the block of the partially-coded block-based picture signal and, hence, selects the coding state in which the block is generated.
Alternatively, the culled object-based picture signal may include an amplitude descriptor for each object visible in the picture and the picture composer may include a block-oriented picture composition module, a shift, mask and merge module, processing modules and a processing selection module. Each amplitude descriptor comprises tiles. The block-oriented picture composition module receives the culled object-based picture signal and identifies, for each block of the partially-coded picture signal, the tile or tiles of the culled object-based picture signal that contribute to the block. The shift, mask and merge module calculates shift, mask and merge parameters for tile that contributes to the block. The processing modules are each capable of receiving the tile or tiles that contribute to each block of the partially-coded block-based picture signal and of partially decoding the tile or tiles and applying thereto the respective shift, mask and merge parameters to generate the block. The processing modules are each capable of decoding the tile or tiles contributing to the block to a coding state that differs among the processing modules. The processing selection module selects one of the processing modules to generate the block of the partially-coded block-based picture signal, and, hence, the coding state in which the block is generated.
The invention also provides a method for transcoding a coded object-based picture signal representing a picture to a coded block-based picture signal representing the picture. In the method, signal portions that represent objects not visible in the picture are culled from the coded object-based picture signal to generate a culled object-based picture signal. Portions of the culled object-based picture signal are partially decoded and from them are generated blocks of a partially-coded block-based picture signal in which the blocks have different coding states. Finally, the blocks of the partially-coded block-based picture signal are re-encoded to generate the coded block-based picture signal in which the blocks have a uniform coding state.
Finally, the invention provides a computer-readable medium in which is fixed a computer program that instructs a computer to perform a transcoding operation in which a coded object-based picture signal representing a picture is transcoded to a coded block-based picture signal representing the picture. In the transcoding operation, signal portions that represent objects not visible in the picture are culled from the coded object-based picture signal to generate a culled object-based picture signal. Portions of the culled object-based picture signal are partially decoded and from them are generated blocks of a partially-coded block-based picture signal in which the blocks have different coding states. Finally, the blocks of the partially-coded block-based picture signal are re-encoded to generate the coded block-based picture signal in which the blocks have a uniform coding state.
Culling the signal portions that represent objects not visible in the picture may include culling signal portions that represent objects that are not present in the picture and culling signal portions that represent objects that are present in the picture, but are hidden.
The object-based picture signal may include a scene descriptor that describes an arrangement of the objects in the picture and may additionally include a coded shape descriptor for each object. Culling the signal portions that represent objects that are not present in the picture may include identifying, using the scene descriptor, the signal portions that represent the objects not present in the picture, and decoding the coded shape descriptors of the objects that the identifying operation identifies as present in the picture to generate respective shape descriptors. In culling the signal portions that represent objects that are present in the picture, but are hidden, the shape descriptors are used to identify the signal portions that represent the objects that are present in the picture, but are hidden.
The object-based picture signal may include an object descriptor for each object. The object descriptor comprises a coded amplitude descriptor including interior tiles and boundary tiles. Culling the signal portions that is represent objects that are present in the picture, but are hidden, may include culling, from the object-based picture signal, signal portions that represent interior tiles and boundary tiles that are hidden in the picture.
The culled object-based picture signal may include a culled amplitude descriptor for each object visible in the picture. The culled amplitude descriptor for each object comprises tiles that represent the portions of the object that are visible in the picture. In this case, in partially decoding portions of the culled object-based picture signal and generating from them the blocks of the partially-coded block-based picture signal, the at least one block of the partially-coded block-based picture signal to which each tile of the culled amplitude descriptors contributes is identified. Shift, mask and merge parameters are calculated for each tile. One of a predetermined number of coding states in which to generate each block of the partially-coded block-based picture signal is selected as a selected coding state. Finally, the tile or tiles that contribute to each block of the partially-coded block-based picture signal are decoded to the selected coding state and the block-generating processing defined by the respective shift, mask and merge parameters is applied to the tile or tiles in the selected coding state to generate the block in the selected coding state.
Alternatively, the culled object-based picture signal may include an amplitude descriptor for each object visible in the picture. The amplitude descriptor comprises tiles. In partially decoding selected portions of the culled object-based picture signal and generating from them the blocks of the partially-coded block-based picture signal, for each block of the partially-coded picture signal, the tile or tiles of the culled object-based picture signal that contribute to the block are identified. Shift, mask and merge parameters are calculated for each of the tile or tiles that contribute to the block. One of a predetermined number of coding states in which to generate each block of the partially-coded block-based picture signal is selected as a selected coding state. Finally, the tile or tiles contributing to the block of the partially-coded block-based picture signal are decoded to the selected coding state and the block-generating processing defined by the respective shift, mask and merge parameters is applied to the tile or tiles in the selected coding state to generate the block in the selected coding state.
The transcoder and transcoding method according to the invention and the transcoding program fixed in the computer-readable medium according to the invention cull portions of the coded object-based picture signal that represent objects that are not visible in the picture before generating the coded block-based picture signal. Compared with conventional transcoders, transcoding methods and transcoding programs, this reduces the processing resources required to process the coded object-based picture signal to generate the coded block-based picture signal, or enables other constraints, such as processing time, to be met more easily since the culled portions of the object-based picture signal are not processed further. Moreover, the transcoder, transcoding method and transcoding program according to the invention process the culled object-based picture signal to generate at least a fraction of the blocks of the coded block-based picture signal in a partially-coded state. Compared with conventional transcoders, transcoding methods and transcoding programs, this further reduces the processing resources required to generate the coded block-based picture signal, or enables other constraints, such as processing time, to be met even more easily. The transcoder, transcoding method and transcoding program according to the invention perform less decoding of the coded object-based picture signal, and perform less encoding to generate the coded block-based picture signal. Moreover, the reduced decoding and encoding applied to the coded object-based picture signal preserve more of the original encoding of the coded object-based picture signal in the block-based picture signal. This reduces the generational quality loss compared with conventional transcoders, transcoding methods and transcoding programs.