This invention relates to an information processing apparatus and an information processing method, a recording medium, and a program, and more particularly to an information processing apparatus and an information processing method, a recording medium, and a program suitable for use where video data compressed bidirectional interframe prediction are edited.
Image compression methods represented by an MPEG (Moving Picture Coding Experts Group/Moving Picture Experts Group) method achieve a high compression efficiency by compression encoding an image signal using interframe prediction. However, where it is intended to edit images, since compressed image materials formed using interframe prediction cannot be spliced together while they remain in the form of a compressed image signal because they have a relationship of compressed signals by prediction between frames. Therefore, a system which is configured taking it into consideration in advance to edit image materials usually performs encoding using only compression within a frame without using interframe prediction.
However, where an image signal of a high definition having a large information amount is handled like, for example, a high definition (HD) signal, if only interframe compression is used for encoding, then only a low compression efficiency is obtained. Therefore, in order to transmit or store a large amount of data, an expensive system is required in that a high transfer speed is required, a large storage capacity is required or a high processing speed is required. In other words, in order to allow an image signal of a high definition having a large amount of information to be handled by a less expensive system, it is necessary to use interframe prediction to assure high compression efficiency.
In the MPEG system, a compression coding system which uses bidirectional interframe prediction and involves I pictures, P pictures and B pictures is called compression of the Long GOP (Group of Pictures) system.
An I picture is an intraframe coded picture coded independently of any other picture, and an image can be decoded from information only of an I picture. A P picture is an interframe forward predictive coded picture represented by a difference from a preceding frame (in the forward direction) with respect to time. A B picture is a bidirectional predictive coded picture coded by motion compensation interframe prediction making use of preceding (in the forward direction), succeeding (in the reverse direction) or preceding and succeeding (in the opposite directions (bidirectional)) pictures with respect to time.
Since the P picture and the B picture have a smaller data amount than the I picture, if the GOP is made longer (that is, if the number of pictures which form a Long GOP is increased), then the compression ratio of the image can be raised. Therefore, the P picture and the B picture are suitable for utilization in digital broadcasting applications and DVD (Digital Versatile Disk) video applications. However, if the GOP is excessively long, then the editing control in the frame accuracy becomes difficult, and a problem in operation takes place in editing in business applications.
A process of splicing two image data compressed by the Long GOP method each other at predetermined editing points (splicing points) is described with reference to FIG. 1.
First, for each of editing object compressed image data 1 and editing object compressed image data 2, partial decoding of apportion in the proximity of an editing point is performed. Consequently, partial non-compressed image signal 1 and image signal 2 are obtained. Then, the non-compressed image signal 1 and image signal 2 are spliced each other at the editing points, and an effect is applied to the portion in the proximity of the editing point as occasion demands and then re-encoding is performed. Then, the re-encoded compressed image data is spliced with the compressed image data which have not undergone the decoding and re-encoding processes (compressed image data other than the portion for which the partial decoding is performed).
The method described above with reference to FIG. 1 is advantageous in that deterioration of the picture quality by re-encoding can be suppressed locally and the editing processing time can be reduced significantly when compared with those of an alternative method wherein all image data of compressed editing materials are decoded and then the image signals are connected to each other at the editing points, whereafter all of the image signals are re-encoded to obtain edited compressed video data.
However, if such a method as described above with reference to FIG. 1 is used to perform editing and re-encoding, then this gives rise to a problem that a picture cannot be referred to at a joint between a portion for which re-encoding is performed and another portion for which no re-encoding is performed.
The following method is known as a countermeasure for the problem described. In particular, where compression is performed using a method (Long GOP) which involves predictive encoding between frames, in order to implement editing comparatively and simply, the interframe prediction is limited so as to adopt a Closed GOP structure such that a picture is referred to only within a GOP but is not referred to across GOPs.
A case wherein limitation to interframe prediction is applied is described with reference to FIG. 2. FIG. 2 illustrates a list of pictures in a display order in regard to data of the compressed material image 1 and data of the compressed material image 2 of an object of editing, data of partially re-encoded data of compressed pictures in the proximity of the editing points after the editing and data of compressed images of a portion for which re-encoding is not performed in order to indicate a relationship between interframe prediction and editing. An arrow mark in FIG. 2 indicates a referencing direction of a picture (this similarly applies also to the other figures). In FIG. 2, 15 pictures of BBIBBPBBPBBPBBP of the display order form one GOP, and referencing to a picture is performed only within the GOP. This method inhibits prediction across GOPs thereby to eliminate the relationship of compressed data by prediction between GOPs thereby to allow re-splicing of compressed data in a unit of a GOP (determination of a range within which re-encoding is to be performed).
In particular, the range for re-encoding is determined in a unit of one GOP including an editing point for data of the compressed material image 1 and data of the compressed material image 2 which are an object of editing, and the data of the compressed material image 1 and the data of the compressed material image 2 which are an object of editing within the re-encoding ranges determined in a unit of one GOP are decoded to produce signals of the non-compressed material image 1 and the non-compressed material image 2. Then, the signal of the non-compressed material image 1 and the signal of the non-compressed material image 2 are spliced each other at the cut editing point, and the material image 1 and the material image 2 spliced together in this manner are partly re-encoded to produce compressed image data. Then, the compressed image data are spliced with the compressed video data of the portions which have not been re-encoded thereby to produce compressed edited image data.
Actually encoded data are arrayed in a coding order as illustrated in FIG. 3, and splicing of compressed image data is performed in the coding order. Referring to FIG. 3, the compressed image data produced by partially re-encoding the material image 1 and the material image 2 spliced together and the compressed image data which have not been re-encoded are spliced at a B13 picture which is the last picture in the coding order in the data of the compressed material image 1 in the portion which has not been re-encoded and is the fourteenth picture in the display order and an I2 picture which is the first picture in the coding order in the compressed image data produced by the re-encoding and is the third picture in the display order. Further, a B12 picture which is the last picture in the coding order in the compressed image data produced by the re-encoding and is the thirteenth picture in the display order and the I2 picture which is the first picture in the coding order in the data of the compressed material image 2 in the portion which has not been re-encoded and is the third picture in the display order are spliced each other. In other words, the compressed image data produced by re-encoding of the material image 1 and the material image 2 spliced together and the compressed image data in the portion which has not been re-encoded are connected at GOP changeover portions to produce compressed edited image data.
On the other hand, a GOP structure which does not have the Closed GOP structure, that is, a Long GOP structure where an image is referred to across GOPs, is called Open GOP.
Also a technique for splicing two bit streams of the Open GOP structure while preventing otherwise possible deterioration of the picture quality at splicing portions when bit streams of MPEG encoded pictures having the Open GOP structure are spliced together is available. When two bit streams of the Open GOP structure are edited, or more particularly when a bit stream Y is inserted into another bit stream X, a B picture preceding to an I picture which forms the first GOP of the bit stream Y (a B structure which appears before an I picture is displayed) is deleted and the temporal references of the remaining pictures which form the GOP are changed so that the B picture prior to the I picture which is predicted using a picture which forms the last GOP of the bit stream X may not be displayed to prevent such deterioration of the picture quality as described above. One of such techniques is disclosed, for example, in Japanese Patent Laid-Open No. Hei 10-66085 (hereinafter referred to as Patent Document 1).