This invention relates to an information processing apparatus and an information processing method, a recording medium, and a program, and more particularly to an information processing apparatus and an information processing method, a recording medium, and a program suitable for use where video data compressed bidirectional interframe prediction are edited.
In order to edit an MPEG (Moving Picture Coding Experts Group/Moving Picture Experts Group) stream, a technique is conventionally used wherein pictures in the proximity of an editing point (splicing point) are decoded once and resulting non-compressed image signals are joined together at the editing point, whereafter the resulting signal is re-encoded. The technique is disclosed, for example, in International Publication No. WO99/05864 (hereinafter referred to as Patent Document 1).
In the MPEG system, a compression coding system which uses bidirectional interframe prediction and involves I pictures, P pictures and B pictures is called compression of the Long GOP (Group Of Pictures) system.
An I picture is an interframe coded picture coded independently of any other picture, and an image can be decoded from information only of an I picture. A P picture is an interframe forward predictive coded picture represented by a difference from a preceding frame (in the forward direction) with respect to time. A B picture is a bidirectional predictive coded picture coded by motion compensation interframe prediction making use of preceding (in the forward direction), succeeding (in the reverse direction) or preceding and succeeding (in the opposite directions (bidirectional)) pictures with respect to time.
Since the P picture and the B picture have a smaller data amount than the I picture, if the GOP is made longer (that is, if the number of pictures which form a Long GOP is increased), then the compression ratio of the image can be raised. Therefore, the P picture and the B picture are suitable for utilization in digital broadcasting applications and DVD (Digital Versatile Disk) video applications. However, if the GOP is excessively long, then the editing control in the frame accuracy becomes difficult, and a problem in operation takes place in editing in business applications.
An encoder which is conventionally used widely is shown in block diagram in FIG. 1.
Referring to FIG. 1, the encoder 1 shown includes an image re-arrangement section 11 which re-arranges frame images of image data successively inputted thereto as occasion demands or divides frame images of image data into macro-blocks each formed from luminance signals of 16 pixels×16 lines to produce macro-block data. The image re-arrangement section 11 supplies the produced macro-block data to an arithmetic operation section 13 and a motion detection section 12.
The motion detection section 12 receives the macro-block data as an input thereto, calculates motion vectors of the individual macro-blocks and signals the motion vectors as motion vector data to a motion compensation section 20.
The arithmetic operation section 13 performs motion compensation for the macro-block data supplied thereto from the image re-arrangement section 11 based on the image types of the macro-blocks. More particularly, the arithmetic operation section 13 performs motion compensation for the I picture using the intra-prediction mode, performs motion compensation for the P picture using the forward prediction mode, and performs motion compensation for the B picture using the bidirectional prediction mode.
Here, the intra-prediction mode is a method wherein a frame image of an object of encoding is used as it is as transmission data. The forward prediction mode is a method wherein predictive residuals between a frame image of an object of encoding and a reference image in the past are used as transmission data. The bidirectional prediction mode is a method wherein predictive residuals between a frame image of an object of encoding and reference images in the past and in the future are used as transmission data.
First, if macro-block data represent an I picture, then the macro-block data are processed using the intra-predictive mode. In particular, the arithmetic operation section 13 signals a macro-block of the macro-block data inputted thereto as it is as arithmetic operation data to a DCT (Discrete Cosine Transform) section 14. The DCT section 14 performs a DCT transform process for the arithmetic operation data inputted thereto into DCT coefficients and signals the DCT coefficients as DCT coefficient data to a quantization section 15.
The quantization section 15 performs a quantization process for the DCT coefficient data inputted thereto and signals resulting quantized DCT coefficient data to a VLC (Variable Length Code) section 16 and a dequantization section 17.
The quantized DCT coefficient data signaled to the dequantization section 17 undergo a dequantization process with a quantization step size equal to that used in the quantization section 15 by the dequantization section 17 and are signaled as DCT coefficient data to an inverse DCT section 18. The inverse DCT section 18 performs an inverse DCT process for the DCT coefficient data supplied thereto and signals resulting data to an arithmetic operation section 19.
On the other hand, if the macro-block data represent a P picture, then the arithmetic operation section 13 performs a motion compensation process according to the forward prediction mode for the macro-block data, but if the macro-block data represent a B picture, then the arithmetic operation section 13 performs a motion compensation process according to the bidirectional prediction mode for the macro-block data.
In the forward prediction mode, the motion compensation section 20 performs motion compensation in accordance with the motion vector data supplied thereto from the motion detection section 12 to calculate forward prediction picture data or bidirectional prediction picture data. The arithmetic operation section 13 executes a subtraction process for the macro-block data using the forward prediction picture data or bidirectional prediction picture data supplied thereto from the motion compensation section 20.
In particular, in the forward prediction mode, the motion compensation section 20 supplies forward prediction picture data to the arithmetic operation section 13 and the arithmetic operation section 19. The arithmetic operation section 13 arithmetically operates forward prediction picture data from the macro-block data supplied thereto to obtain difference data as predictive residuals. Then, the arithmetic operation section 13 signals the difference data to the DCT section 14.
The forward prediction picture data are supplied from the motion compensation section 20 to the arithmetic operation section 19. The arithmetic operation section 19 adds the forward prediction picture data to the arithmetic operation data supplied thereto from the inverse DCT section 18 to locally reproduce the reference image data.
On the other hand, in the bidirectional prediction mode, the motion compensation section 20 supplies bidirectional prediction picture data to the arithmetic operation section 13 and the arithmetic operation section 19. The arithmetic operation section 13 subtracts the bidirectional prediction picture data from the macro-block data supplied thereto to obtain difference data as predictive residuals. Then, the arithmetic operation section 13 signals the difference data to the DCT section 14.
The bidirectional prediction picture data are supplied from the motion compensation section 20 to the arithmetic operation section 19, and the arithmetic operation section 19 adds the bidirectional prediction picture data to the arithmetic operation data supplied thereto from the inverse DCT section 18 to locally reproduce the reference picture data.
Thus, the picture data inputted to the encoder 1 undergo the motion compensation prediction process, DCT transform process and quantization process and supplied as quantized DCT coefficient data to the VLC section 16. The VLC section 16 performs a variable length coding process based on a predetermined conversion table for the quantized DCT coefficient data and signals resulting variable length coded data to a buffer 21. The buffer 21 buffers once and then outputs the variable length coded data supplied thereto.
Now, a process of joining two image data compressed by the Long GOP method to each other at predetermined editing points is described with reference to FIG. 2.
First, for each of editing object compressed image data 1 and editing object compressed image data 2, partial decoding of apportion in the proximity of an editing point is performed. Consequently, partial non-compressed image signal 1 and image signal 2 are obtained. Then, the non-compressed image signal 1 and image signal 2 are joined to each other at the editing points, and an effect is applied to the portion in the proximity of the editing point as occasion demands and then re-encoding is performed. Then, the re-encoded compressed image data are joined to the compressed image data which have not undergone the decoding and re-encoding processes (compressed image data other than the portion for which the partial decoding is performed).
The method described above with reference to FIG. 2 is advantageous in that deterioration of the picture quality by re-encoding can be suppressed locally and the editing processing time can be reduced significantly when compared with those of an alternative method wherein all image data of compressed editing materials are decoded and then the image signals are connected to each other at the editing points, whereafter all of the image signals are re-encoded to obtain edited compressed video data.
However, if the popular encoder 1 having such a popular configuration as described above with reference to FIG. 1 is used to perform editing and re-encoding by such a method as described above with reference to FIG. 2, then this gives rise to a problem that a picture cannot be referred to at a joint between a portion for which re-encoding is performed and another portion for which no re-encoding is performed.
The following method is known as a countermeasure for the problem described. In particular, where compression is performed using a method (Long GOP) which involves predictive encoding between frames, in order to implement editing comparatively simply, the interframe prediction is limited so as to adopt a Closed GOP structure such that a picture is referred to only within a GOP but is not referred to across GOPs.
A case wherein limitation to interframe prediction is applied is described with reference to FIG. 3. FIG. 3 illustrates a list of pictures in a display order in regard to the compressed material image 1 and the compressed material image 2 of an object of editing, partially re-encoded data of compressed pictures in the proximity of the editing points after the editing and data of compressed images of a portion for which re-encoding is not performed in order to indicate a relationship between-interframe prediction and editing. Arrow marks in FIG. 3 indicate a referencing direction of a picture (this similarly applies also to the other figures). In FIG. 3, 15 pictures of BBIBBPBBPBBPBBP of the display order form one GOP, and referencing to a picture is performed only within the GOP. This method inhibits prediction across GOPs thereby to eliminate the relationship of compressed data by prediction between GOPs thereby to allow re-joining of compressed data in a unit of a GOP (determination of a range within which re-encoding is to be performed).
In particular, the range for re-encoding is determined in a unit of one GOP including an editing point for data of the compressed material image 1 and data of the compressed material image 2 which are an object of editing, and the data of the compressed material image 1 and the data of the compressed material image 2 which are an object of editing within the re-encoding ranges determined in a unit of one GOP are decoded to produce signals of the non-compressed material image 1 and the non-compressed material image 2. Then, the signals of the non-compressed material image 1 and the non-compressed material image 2 are joined to each other at the cut editing point, and the material image 1 and the material image 2 joined together in this manner are partly re-encoded to produce compressed image data. Then, the compressed image data are joined to the compressed video data of the portions which have not been re-encoded thereby to produce compressed edited image data.
A Long GOP structure which does not have the Closed GOP structure, that is, a Long GOP structure where an image is referred to across GOPs, is called Open GOP.