(1) Field of the Invention
The present invention relates to a coding and decoding apparatus for coding information such as an image signal etc. to transmit it and decoding the coded data. More detailedly, the present invention relates to a coding and decoding apparatus which enables communication between coding and decoding tools having different processing capacities and in which the coding apparatus transmits not only the coded data but also coding information for the construction of a decoding scheme as the means of decoding the coded data and the decoding apparatus receives the coding information together with coded data and reconstructs the decoding scheme based on the coding information so as to decode the received coded data. Further, the present invention is directed to a coding and decoding technology for performing the communication in a coding and decoding apparatus between the transmitting and receiving devices having different capacities in the case where an algorithm includes various coding and decoding tools such as near-future image coding schemes represented by the MPEG4 etc., and more particularly relates to a coding and decoding apparatus which enables simultaneous transmission of coded data and tool information for constructing the algorithm for decoding the coded data in order to realize a hierarchical coding and decoding operation.
(2) Description of the Background Art
In recent years, a wide spread of ISDN (Integrated Services Digital Network) has realized image communication services as a new communication service. Examples of the services include the video phone and video conference system, etc. On the other hand, the development of the mobile communication networks represented by the PHS and the FPLMTS, accelerates demands for further improvement and variations of the services and portability of the devices.
In general, in the case where image information as in the video phone or video conference system is transmitted, the amount of image information is very large. However, due to the line speed used for the transmission and the cost problem, the image information to be transmitted needs to be compressed and coded so that the amount of information can be reduced.
As to the coding schemes for compressing image information, JPEG (Joint Photographic coding Experts Group) has already been standardized internationally for a still image coding system, H.261 for a motion picture coding scheme, and MPEG1 (Moving Picture Coding Experts Group) and MPEG2 for motion picture coding schemes. Further, MPEG4 is now being standardized as a coding scheme of very low-bit rate of 64 kbps or below.
In the current coding schemes such as JPEG, H.261, MPEG1, MPEG2, coding is performed following the specified algorithm. However, the MPEG4 is planned to flexibly deal with various applications and encode each of the applications in its optimal scheme. For this purpose the MPEG4 needs to have many tools (such as transformer, quantizer, inverse transformer, inverse quantizer, etc.) for its coder so that a suitable combination of them will be selected to perform coding.
FIG. 1A is a conceptual, view showing the structure of a coding data stream which is formed by coding (compressing) image data based on the H.261 scheme. Each piece of the coded data such as motion vector information, DCT-coefficient, quantization step, etc., shown in FIG. 1A is image data which has been coded (compressed) based on a fixed coding algorithm in the coder, while the decoder has a decoding algorithm fixed corresponding to the coding algorithm so that the received pieces of the coded data will be decoded.
FIG. 1B is a conceptual view showing the structure of a coding data stream which is formed by coding (compressing) image data based on a coding scheme such as MPEG4 etc. whose algorithm is flexible. The coding data stream as shown in FIG. 1B is composed of coded (compressed) image data such as motion vector information 2, transform coefficient 4, motion vector information 6, transform coefficient 8 and quantization step 10 etc., and tool information such as motion compensation tool 1, inverse transform tool 3, motion compensation tool 5, inverse transform tool 7 and quantizing tool 9, etc., for decoding respective image data. FIG. 1B illustrates the details of the motion vector information, DCT-coefficient and quantization step at the leading end of the coding data stream of FIG. 1A. In this case, each piece of the tool information such as motion compensation 1 etc., is allowed to be selected from a number of types of the tool information so that it is possible to freely select a desired combination of the tool information. Accordingly, the coder transmits the tool information which has been used for coding as well as the image data to the decoder. The decoder, upon the decoding of the image data received, will decode the coded image data using the tool information transmitted from the coder.
FIG. 1C is a block diagram showing an example of a conventional coding and decoding apparatus based on H.261. This coding and decoding apparatus is composed of a controller 6a for controlling the entire apparatus, a coder 7a for coding based on H.261, and a decoder 8a for decoding the information which has been coded based on H.261, and a tool storage section 9a consisting of memories for storing tool information.
These coding and decoding processes can be realized by a dedicated hardware device with software installed therein or by an appropriate program executed in a general-purpose processor with a compiler.
First, description will be made of a method using a dedicated hardware device with software installed therein. FIG. 2 is a block diagram showing the configuration of coder 7a of FIG. 1C for yielding the coded data shown in FIG. 1A, based on H.261. In FIG. 2, the coder is composed of: a coding controller 11 for the control of coding; a transformer 12 for performing the DCT; a quantizer 13 for quantizing the coefficients transformed by transformer 12; an inverse quantizer 14 for performing inverse quantization of the coefficients quantized in quantizer 13; an inverse transformer 15 for performing the inverse DCT; a memory 16; and a loop filter 17. Here, memory 16 has the function of causing a variable delay for motion compensation, used when the inter-frame prediction for motion compensation is performed. Filter 17 is the loop filter capable of performing the on/off operation for each of macro blocks.
When the coding algorithm for generating the coding data stream shown in FIG. 1A is executed by the dedicated hardware device with software, the tool functions constituting the algorithm are carried out by software and the dedicated hardware components as shown in FIG. 2, namely, coding controller 11, transformer 12, quantizer 13, inverse quantizer 14, inverse transformer 15, memory 16 having the function of causing a variable delay for motion compensation, and loop filter 17. FIG. 3 is a block diagram showing the configuration of decoder 8a shown in FIG. 1C for decoding the coded data based on H.261. This decoder commonly has the constituents of the coder shown in FIG. 2, and the same components as those in the coder of FIG. 2 are designated at the same reference numerals. Specifically, in FIG. 3, a reference numerals 14 designates an inverse quantizer, 15 an inverse transformer, 16 a memory having the function of causing a variable delay for motion compensation, and 17 a loop filter.
The coded data by the coder shown in FIG. 2 is inverse quantized by inverse quantizer 14, and the signal is then made to undergo the inverse DCT in inverse transformer 15. Here, memory 16 and loop filter 17 are used when the motion compensated prediction coding data is decoded.
When several kinds of algorithms need to be processed using the scheme which performs the coding operation based on a fixed algorithm such as H.261 etc. as stated above, an individual hardware device with software is needed to execute each of the algorithms. FIG. 4 is block diagram showing the structure of a coder which codes the signal of a motion picture based on H.261 and the signal of a still image based on JPEG. For example, when a motion picture is coded based on H.261 and a still image is coded based on JPEG, the coder should have the configuration as shown in FIG. 4, which includes two individual coders, namely a H.261 coder 20 and a JPEG coder 21. In FIG. 4, H.261 coder 20 and JPEG coder 21 receive the motion picture data and the still image data respectively to output coded data of compressed data.
When the algorithm for generating the coded data shown in FIG. 1B is executed by a dedicated hardware device with software, a coder for executing this algorithm is realized by the one shown in FIG. 2 in which the circuit block designated at 18 is configured by the configuration shown in FIG. 5. In this case, the coder has plural types for each of the tools, or, transformer 12, quantizer 13, inverse quantizer 14, and inverse transformer 15. In this configuration, one desired type is selected for each of the tools (one type from transformer tools A to X, one type from quantizer tools A to X, one type from inverse quantizer tools A to X and one type from inverse transformer tools A to X) to perform a coding process.
The decoder for decoding the coding data stream shown in FIG. 1B is realized in a decoder shown in FIG. 3 in which the circuit block designated at 19 is replaced by a circuit block 22 in FIG. 5. In this case, the decoder has plural types for each of the tools, or, inverse quantizer 14, and inverse transformer 15. In this configuration, one desired type is selected fore ach of the tools (one type from inverse quantizer tools A to X and one type from inverse transformer tools A to x) to perform a decoding process.
In this decoding process, each piece of the tool information shown in FIG. 1B, for motion compensation tool 1, inverse transforming tool 3, motion compensation tool 5, inverse transforming tool 7 and quantizing tool 9 is sent to a controller 23, and each piece of the image data, which follows the corresponding tool information, specifically, of motion vector information 2, transform coefficient 4, motion vector information 6 and transform coefficient 8, is sent to the corresponding tools where each image data is processed. At the time, controller 23 selects one of the tools (one from inverse quantizing tools A to X and one from inverse transforming tools A to X shown in FIG. 5) based on the corresponding tool information. in this way, each piece of the image data is processed through the tool selected by controller 23 and is decoded.
However, this method needs a dedicated device with software for each of the tools, thus the scale of the decoder tends to become large. To make matters worse, if the decoder receives the data which has been processed by a tool that is not provided for the decoder, it is impossible to decode the data itself. To solve this problem, a way that can be considered is one in which parts received should be compiled to prepare a processing program and the data should be decoded by a general-purpose processor.
Next, description will be made of a method of achieving the decoding process by executing a suitable program using a general-purpose processor with a compiler. Now, referring to FIG. 6, description will be made of a case where the coding data stream having the structure shown in FIG. 1B is decoded. FIG. 6 is a block diagram showing the structure of a decoder composed of a general-purpose processor 24 and a compiler 25. When all the tool information as shown in FIG. 1B, which includes a motion compensation tool 1, inverse transforming tool 3, motion compensation tool 5, inverse transforming tool 7 and quantizing tool 9, etc., is given to compiler 25, the compiler will prepare a processing program for controlling the operation of general-purpose processor 24. Each piece of the image data, which follows the corresponding tool information, specifically, motion vector information 2, transform coefficient 4, motion vector information 6, transform coefficient 8, quantization step 10, is given to general-purpose processor 24. Then, general-purpose processor 24 processes, with the processing program prepared by the compiler 25, the coded image data following the tool information so as to decode it for producing its decoded data.
In the case where the capacity of the decoding apparatus for processing a certain algorithm is lower than the total processing capacity for all the tools constituting the algorithm requested by the coder side, even if the tools transmitted from the coder is stored at the decoder side, the received data cannot be decoded exactly due to the inferior processing capacity of the decoder side. Thus the memory in the tool storage is also used just in vain.
Also in the conventional coding and decoding apparatus, when the tools which were used in the coder side are compared to the tools which are stored at the decoder side, the tools themselves should be compared to each other; this process required a very long period of time.
In the case where a new algorithm is used to decode the coded information, even if the tools for the algorithm are equivalent to those which have been previously stored, the decoder should receive the tools once again; this process also considerably lengthened the transmission/reception time.
In this way, when a video signal etc. is coded, the coding tools having suitable coding capacities to the quality of the reproduction image required by the decoder side, are selected to perform the coding operation. When the thus obtained coded data is decoded, it is necessary that the decoder should use decoding tools having decoding capacities (i.e. processing capacities) which correspond to the coding capacities (i.e. processing capacities) for the coding tools which were used for the coding operation. Processing capacity indicates a resource that is necessary for coding, decoding or both. For example, coding capacity may be expressed as processing capacity for coding. If these tools on the decoder side do not have the processing capacities for the tools on the coder side, the coded data cannot be decoded, thus making it impossible to establish the communication.
An example of the algorithm for the frame predictive coding will hereinbelow be described. Frame predictive coding shall mean inter-frame predictive coding, intra-frame predictive coding, or both as the context requires. Inter-frame predictive coding refers to any technique for data compression in which a subsequent frame, or a portion thereof, is encoded as differential data with respect to an earlier reference frame. Pixel data may be expressed as such differential data if inter-frame predictive coding is used. Illustratively, description will be made of the influence on the communication when the processing capacities for the frame predictive coding tools are not in agreement with those for the frame predictive decoding tools. The frame predictive coding may be considered as improving the quality of a display image on the decoder side since inter-frame predictive coding is an image data processing technology. Based on the data of the pixels directly obtained by sampling the video signal on the coder side, the pixel data for the display pixels on the decoder side are defined more minutely than the sampling pixels of the coder side since the pixels on the decoder side are predictively interpolated.
FIGS. 7A through 7C are conceptual diagrams for illustrating pixel data arrangements resulting from frame predictive coding. FIGS. 7A, 7B and 7C show the arrangements of pixel data (a1 to d1, a2 to i2, a4 to Y4) produced based on the image data A to D which is directly obtained by coding the video signal inputted from a visual sensor such as a camera etc., by means of the frame predictive coding tools of sampling the data per single pixel, per ½ pixel and per ¼ pixel, respectively. This pixel data is transmitted as the coded data obtained by the frame predictive coding, from the coder side to the decoder side. Here, in each of the frame predictive coding tools, the arithmetic operation for each piece of the pixel data is made based on the calculating formulae shown in Table 1.
TABLE 1Pixel data basedPixel data based onOperationon the samplingthe sampling performulaper ½ pixel¼ pixelAa2a4Bc2e4Cg2u4Di2y4(A + B)/2b2b4, c4, d4(A + C)/2d2f4, k4, p4(B + D)/2f2j4, o4, t4(C + D)/2h1v4, w4, x4(A + B + C + D)/4e2g4, h4, i4, l4m4, n4, g4, r4, s4
In FIG. 7A, pixel data a1 to d1 for the pixels indicated by ‘+’ is the pixel data (corresponding to the (n+1)-ranked coded data when the pixel data obtained by the aftermentioned frame predictive coding tool of sampling per ½ pixel is assumed as the n-ranked coded data) produced by the frame predictive coding tool (corresponding to the (n+1)-ranked coding tool when the aftermentioned frame predictive coding tool of sampling per ½ pixel is assumed as the n-ranked coding tool) of sampling per single pixel. In this case, pixel data a1 to d1 obtained by the frame predictive coding is equivalent to image data A to D which is directly obtained by coding the video signal.
In FIG. 7B, pixel data a2 to i2 for the pixels indicated by ‘+’ and ‘o’ is the pixel data (corresponding to the (n+1)-ranked coded data when the pixel data obtained by the aftermentioned frame predictive coding tool of sampling per ¼ pixel is assumed as the n-ranked coded data) produced by the frame predictive coding tool (corresponding to the (n+1)-ranked coding tool when the aftermentioned frame predictive coding tool of sampling per ¼ pixel is assumed as the n-ranked coding tool) of sampling per ½ pixel. Of these, the pixel data for the pixels indicated by ‘+’ is equivalent to the pixel data obtained by the frame predictive coding tool of sampling per single pixel, while the pixel data for the pixels indicated by ‘o’ is the interpolated pixel data which has been predicated based on image data A to D.
In FIG. 7C, pixel data a4 to y4 for the pixels indicated by ‘+’ ‘o’, ‘Δ’ is the pixel data produced by the frame predictive coding tool of sampling per ¼ pixel. Of these, the pixel data for the pixels indicated by ‘+’ and ‘o’ is equivalent to the pixel data obtained by the frame predictive coding tool of sampling per ½ pixel. The pixel data for the pixels indicated by ‘+’ is equivalent to the pixel data obtained by the frame predictive coding tool of sampling per single pixel, while the pixel data for the pixels indicated by ‘o’ and ‘a’ is the interpolated pixel data which has been predicated based on image data A to D.
As understood from FIGS. 7A to 7C, pixel data a2 to i2 obtained by the frame predictive coding per M pixel sampling, includes pixel data a1 to d1 (image data A to D) obtained by the frame predictive coding per single pixel sampling, therefore the pixel data is distributed four times as dense as that of sampling pixels of the video signal. Pixel data a4 to y4 obtained by the frame predictive coding per ¼ pixel sampling, includes the pixel data obtained by the frame predictive coding per single pixel sampling and per ½ pixel sampling, therefore the pixel data is distributed sixteen times as dense as that of the sampling pixels of the video signal.
In this way, the pixel data obtained by the inter-frame predictive coding tools for producing the pixel data of high-density display pixels, hierarchically includes the pixel data obtained by the frame predictive coding tools for producing the pixel data of the lower density display pixels. For instance, pixel data a4 to y4 obtained by the frame predictive coding tool of sampling per ¼ pixels hierarchically includes pixel data a2 to i2 as well as pixel data a1 to d1.
FIGS. 8A to 8C are illustrations explaining the effects on the image display by the inter-frame predictive coding and showing the display images of a pattern TA obtained by decoding the coded data based on the inter-frame predictive coding per single pixel sampling, per ½ pixel sampling and per ¼ pixel sampling, respectively. In FIGS. 8A to 8C, ‘o’ and ‘∘’ represent pixels of ‘light’ and ‘dark’ display states, respectively when the coded data obtained by subjecting the video signal of pattern TA as a subject to the inter-frame predictive coding is decoded. Here, in FIGS. 8A to 8C, pattern TA as a subject is assumed to move in the lower right direction, and for easy understanding of the positional relation between pattern TA and the pixels, display images are laid over the pixels, for reference.
Referring to FIGS. 8A through 8C, according to the frame predictive coding per single pixel, pattern TA is represented wit three pixels before movement and with one pixel after movement. On the other hand, according to the frame predictive coding per ½ pixel, pattern TA is represented with six pixels before movement and with three pixels after movement. Further, according to the frame predictive coding per ¼ pixel, pattern TA is represented with fifteen pixels before movement and with ten pixels after movement. In this way, as the dividing number of the pixels in the frame predictive coding is increased so that the density of the display pixels at the decoder side is increased, it becomes possible to reproduce an image of high quality with high precision.
Next, description will be made of a means for practicing the decoding scheme by the combination of individual functional tools (functional modules) independent of one another in the coder described above.
FIG. 9 shows an example of a coding data stream to be used when the coded data based on H.261 is transmitted to a device which does not have the decoding function based on H.261. As stated above, since it is assumed that the coding scheme is not invariant and the combination of the functional tools in the coder can be freely selected, it is necessary to transmit the information of the type of the coding scheme based on which the signal was coded and the types of the functional tools used in the coding process (this information will hereinbelow be referred to as coding information), together with the coded data. In FIG. 9, the data stream includes: coding information composed of motion compensation tool 112a, inverse transforming tool 112b, quantizing tool 112c and decoding scheme constructing information 111; and coded data of motion vector information 113a, transform coefficient 113b and quantization step 113c, which follow the corresponding coding information. The aforementioned each of the functional tool 112a to 112c designate the orders of decoding corresponding coded data 113a to 113c, and may contain operation specifications in some cases, may just indicate the identifying numbers of the functional tools in the other cases. Decoding scheme constructing information 111 specifies the functional tools to be used and the methods of using the resultant outputs from the tools, and other information. In the case shown in FIG. 9, the result after the motion compensation is used to handle the data of a certain image block decoded right before, for instance. That is, this result indicates the information relating to the order of procedure of the coding scheme H.261 in this case. The device on the decoding side, which has received the coding data stream shown in FIG. 9, is able to construct the decoding scheme by interpreting the decoding scheme constructing information, the motion compensation tool, the inverse transforming tool and the quantizing tool so that it can exactly decode the decoded data that follows.
As stated above, the coding information may contain the processing order of the tools and how to use the result obtained from each tool etc., so that the decoder will be able to decode the received coded data even if the signal which requires tools or is based on a decoding scheme that is not provided on the decoder side, is received. In order to improve the efficiency in the use of the line, however, it is preferable to use a decoding scheme which is able to work with a less amount of data to be transmitted such as the specifications on the construction of the decoding scheme and the tool information. In practice, since the purpose of the usage and the required quality will be determined to a certain degree depending on the coding and decoding apparatus, it is realistic that each coding and decoding apparatus has a number of coding and decoding schemes, in advance, which are expected to be used more frequently.
FIG. 10 shows an example of the coding data stream which can be used for the communication between two devices both having some coding and decoding schemes which are expected to be used more frequently. For the coding information, the same decoding scheme incorporated in the decoder is called up by transmitting a predetermined identification code 121a so that the coded data received can be decoded. Comparing with the example of FIG. 9, since this method will not need the transmission of the information on functional tools and the decoding scheme constructing information, it is possible to drastically reduce the transmitted amount of data and therefore the improvement of the efficiency in the use of the communication line can be expected.
However, if the divided number of pixels in the frame predictive coding is dissimilar (or the frame predictive coding tools are different), the structure of the coded data becomes quite different and thus it becomes impossible to interchange the coded data. For this reason, in accordance with the conventional coding and decoding system (method and devices), the decoding side needs to perform its decoding operation using a decoding tool suitable to the structure of the coded data. That is, the decoding tool should have the decoding capacity in one-to-one correspondence to the coding capacity for the coding tool to perform the decoding operation. Therefore, when the processing capacity for the decoding tool is not in agreement with that for the coding tool, it is totally impossible to decode the coded data received.
When the data which is coded using an algorithm provided with various tools (represented by MPEG4, for example) is attempted to be decoded by the device just having a single algorithm such as MPEG1, the decoding side needs additional hardware and/or software for operating the algorithm (coding tools) used in the coding. Therefore, the device is increased in size and cost.
As seen also in the H.261 coding scheme etc., the detailed specifications of the coding scheme is usually switched depending on which is more important, the efficiency of coding or the quality of image, or depending upon the nature etc., of the input image. Further, the usage will be limited if the system has only limited types of coding schemes previously equipped, as stated above. Therefore, it becomes necessary to change over the coding scheme in accordance with the usage. In this case, if the coder side tries to transmit data coded based on the scheme that is not equipped on the decoder side, the coding information should be simultaneously transmitted, as already mentioned above. At this moment, in accordance with the aforementioned method, all the coding information, as shown in FIG. 9, including the information of the functional tools used in the decoding scheme previously provided in the decoder side, needs to be transmitted regardless of whether the difference of the decoding scheme from the coding scheme is small or great. That is, even when the coding scheme is not much different from the decoding scheme that is previously provided, the communication may require a large transmission rate, thus possibly reducing the efficiency in the use of the line. In practice, however, since there are some functional tools which can be commonly used with little dependence on the difference in coding schemes, such as the transform coding in the motion picture, etc., it is possible to develop different kinds of coding schemes by adding other functional tools to the basic functional tools as such.
Further, in recent years, it has become possible to download the tools for JPEG and MPEG1 stated above, on the personal computer communications network etc., and receive image signal and decode it based on the downloaded tools. Therefore, it can be guessed readily that in the near-future video communications, the communication will be able to be performed by downloading the tools for coding and decoding. However, in the aforementioned coding and decoding system of the conventional scheme, the communication can be performed based only on the limited kinds of coding and decoding algorithms. In the case of the next generation image coding scheme (such as MPEG4) which can flexibly deal with various applications and can code the signal in the most suitable manner to each of the applications, if several kinds of algorithms are tried to be processed by a scheme which performs coding with a fixed algorithm such as JPEG, H.261, MPEG1, MPEG2, etc., it becomes necessary to provide hardware and/or software for executing each algorithm. In this way, it is preferable that all the various kinds of algorithms are provided for both the transmitting and receiving sides. However, if all the tools are provided to deal with all the algorithms, the hardware and software becomes bulky, and the apparatus will increase in cost and inevitably becomes large. On the other hand, if the apparatus is reduced in cost and size and therefore the apparatus does not have adequate capacities, the risk of the failure to perform communications becomes high.
In the coding and decoding apparatus which does not have the above capacity, the decoder will download the tools for the required algorithm so as to be able to flexibly deal with the various kinds of applications and decode the signal. In such a coding and decoding apparatus which downloads the tools for the algorithm and is able to store the tools previously used, if the tools stored are not the ones which are required for the next communication, the required tools must be downloaded again before the transmission of the coded data. Therefore, the delay before the start of transmission to the decoding of the coded data becomes long.
In the above coding and decoding apparatus which is able to store the tools previously used, if the coding and decoding tools are provided in such a hierarchical manner that the tools for high quality is provided at the lower rank and the tools for assuring minimum quality which are not replaceable with other tools are provided at the higher rank, it becomes possible to decode the signal using those tools for minimum quality even if the capacity of the decoding apparatus is different from that of the coding apparatus. In this case, the delay before the start of transmission due to the downloading of the tool information can be eliminated, however it is impossible to decode the signal with the anticipated quality. In this case, when the signal is decoded with the anticipated quality, it is necessary to previously download the tools for the anticipated quality. Therefore, the situation is quite similar to the case where the tools are not provided in the hierarchical manner. That is, the delay before the start of transmission to the decoding of the coded data becomes long and therefore it is impossible to make use of the merit from the hierarchical structure of the tools.