1. Field of the Invention
The present invention relates to a video editing apparatus, a video editing method, and a medium for storing a video editing program, and relates more particularly to technology for extracting a plurality of contiguous frames (scenes) from a bitstream encoded according to a Motion Picture Expert Group (MPEG) standard, and producing a new bitstream by combining a plurality of the extracted scenes.
2. Description of Related Art
MPEG is a family of international standards for encoding moving pictures (hereafter referred to as simply xe2x80x9cvideoxe2x80x9d). It includes MPEG-1, which is used for video CD and PC video data, for example, and MPEG-2, which is used with DVD and digital broadcast satellite. Other applications for the MPEG standards continue to be found.
More specifically, MPEG has been adopted by the International Standards Organization (ISO) as a standard for a video coding method defining bitstream interpretation and decoding techniques. The MPEG-1 standard has been adopted as ISO-11172, and MPEG-2 as ISO-13818.
MPEG-1 defines a compression technique for compressing and storing video to a digital storage medium with a 1.5 Mbps transfer rate.
MPEG-2 extends MPEG-1, and defines a compression technique more specifically considering applications with communications and broadcast media, in addition to storage media.
Under MPEG-1, video data consists of a sequence of picture frames, enabling the pictures to be compressed using correlations within each frame (intra-frame coding) and correlations between frames (inter-frame coding). Combining these coding techniques yields three picture types based on the compression technique(s) used: I-pictures, or intra-coded pictures; P-pictures, or predictive-coded pictures based only on temporally preceding pictures; and B-pictures, or bidirectionally predictive-coded pictures.
I-pictures are coded based solely on the data within that picture frame, and thus have no correlation to any other frame. P-pictures are coded with reference (correlation) to a temporally preceding (past) frame. B-pictures are coded with correlation to temporally preceding (past) and/or following (future) frames.
FIG. 13 shows the correlation between pictures in an MPEG-1 bitstream. Each square in FIG. 13 represents one picture (frame).
Each frame is labelled with the picture type and ordinal sequence. I indicates an I-picture, P a P-picture, and B a B-picture. Note that this same designation is used throughout the figures and this specification to indicate the picture type.
The frames are further shown in display order from left to right, and the arrows in FIG. 13 indicate the correlation between frames. For example, from FIG. 13 we know that frame B3 is coded with reference to frames I1 and P4.
Because a specific frame can thus be coded with reference to a temporally following (future) frame, the sequence in which frames are presented (the display order, shown on the top row in FIG. 14) to the viewer and the sequence in which frames are stored on the data storage medium (the coding order or data cumulation order in buffer, shown on the bottom row in FIG. 14) are different in an MPEG-1 bitstream containing B-pictures.
Generally speaking, the compression efficiency of these picture types is:
I-pictures less than P-pictures less than B-pictures
and the code size is conversely
I-pictures greater than P-pictures greater than B-pictures.
The MPEG-2 scheme can be applied to picture data having a frame structure or a field structure. Video scanning methods include, broadly, non-interlaced scanning and interlaced scanning.
In non-interlaced scanning all pixels in one frame are sampled at the same time. In this case the video is a collection of frames, and thus has a frame structure.
With interlaced scanning every other line in one picture frame is sampled at the same time. The first set of lines sampled at a first time is referred to as the first field, and the second set of lines sampled at a second time is referred to as the second field. Each frame in interlaced scan video thus consists of two fields, and the video has a field structure.
The picture structure in MPEG-2 video having a frame structure is the same as in MPEG-1. However, picture correlations in field structure video are more complicated. Picture correlations in field structure video are shown in FIG. 15.
In FIG. 15 each square represents one field, and the fields are arranged in display order. As will be known from FIG. 15, a P-field can be referenced to the most recently decoded I-field, an I-field and a P-field, or to two P-fields.
However, if the P-field is coded using an I-field as the first field and a P-field as the second field, the P-field can only use the I-field, which is the first field, for prediction. For example, field P2 is coded only with reference to field I1.
A B-field is coded using the two most recently decoded temporally preceding and following I- and P-fields, that is, two temporally preceding and two temporally following fields. For example, field B3 uses preceding fields I1 and P2, and following fields P5 and P6.
The display order and coding order of field structure video is shown in FIG. 16 on the top and bottom rows, respectively.
Two particular tasks to be solved with the related art of the present invention are described next.
First Task to be Solved
When an MPEG video stream compressed using both intra and inter coding is edited by extracting a plurality of consecutive frames (scenes) from the bitstream and then combining a selected subset of the extracted scenes to produce a new frame sequence, the pictures referenced for predictive coding might be lost, resulting in pictures that cannot be reproduced.
The reason for this is explained next with reference to FIG. 17. The arrows in FIG. 17 indicate the correlations between pictures. When specific scenes, that is, pictures B3 to B11, are extracted from this picture sequence, the links to referenced pictures indicated by the Xs are lost. In this example, the correlations between pictures I1 and B3, between I1 and P4, and between I13 and B11, are lost.
While picture B3 is coded with reference to picture I1, picture I1 is not in the extracted sequence from B3 to B11, and picture B3 therefore cannot be reproduced. Pictures P4 and B11 also cannot be reproduced for the same reason.
Second Task to be Solved
FIG. 18(a) shows an idealized decoder, referred to as a system target decoder 2, under the MPEG-1 system, and related peripheral components. Encoded MPEG-1 data is input to buffer 1 at a constant bit rate, and data for one decoded picture is read from buffer 1 at a specific decode timing. Picture data is then output either directly or by way of a reordering buffer 3. Differences in the display order and the coding order are absorbed by the reordering buffer 3.
An MPEG-1 encoder codes video while varying the compression rate to adjust the code size (buffer control) by calculating the buffer capacity needed by the decoder during decoding to prevent both data overflow and data underflow states, that is, the data to be temporarily stored to the decoder buffer exceeds buffer capacity, or the buffer is temporarily depleted because the decoder reads data faster than it is stored to the buffer.
FIG. 18(b) shows the change over time in the amount of data stored temporarily to the buffer (buffer fullness). Where the buffer fullness line drops perpendicularly to the x-axis in FIG. 18(b) (i.e., has a slope of xe2x88x92xe2x96xa1) indicates when one picture is read from the buffer 1 by system target decoder 2. The height of the vertical drop in this line is indicative of the code size of one picture. As~ noted above, the code size depends on the picture type where
I-picture greater than P-picture greater than B-picture.
Data is input to the buffer 1 at a constant rate (slope is a constant positive value) in the periods between when the decoder reads picture data from the buffer.
The buffer 1 of decoder 2 will neither overflow nor underflow when decoding buffer-controlled MPEG-1 data. However, if the video bitstream is edited without considering this, the buffer control provided for during encoding will be disrupted, buffer overflow and underflow states will be possible, and the requirements of the MPEG-1 standard will no longer be satisfied.
The second task of the related art is therefore that buffer overflow or underflow states can occur.
How this is possible is further described with reference to FIG. 19. FIG. 19 shows the change in data stored to buffer 1 when scenes 1 and 2 are extracted from a continuous MPEG-1 stream and simply spliced together. FIG. 19(a) shows the change in data before this editing process, and FIG. 19(b) shows the change after editing. If scenes 1 and 2 are simply spliced together such that storing scene 2 data starts from the end of scene 1, the buffer will overflow as indicated by the X in FIG. 19(b).
Data can be coded using either a variable or a constant bit rate in the MPEG-2 standard. As in the MPEG-1 standard, decoder 2 buffer overflow and underflow states are prohibited when coding with a constant bit rate (CBR).
Furthermore, data coded using a variable bit rate (VBR) will not result in decoder buffer 1 overflow during decoding.
The vbv_delay value written to the picture header of VBR coded video is set to 0xFFFF (note that the C language convention of using the 0x prefix to indicate hexadecimal code is followed in this specification), and data is input to the buffer under the following conditions.
Condition 1: If the buffer is not full, data is input to the buffer at the highest bit rate Rmax.
Condition 2: If the buffer is full, data input to the buffer pauses until a predetermined amount of data is removed from the buffer.
In other words, buffer overflow states are intrinsically avoided, and it is therefore only necessary to consider preventing data underflow states.
FIG. 20 shows the change in buffer fullness with VBR coded data. If the buffer capacity is B in FIG. 20, data is input to the buffer at highest bit rate Rmax as long as buffer fullness is less than or equal to B. Once the buffer becomes full at time t1, data input to the buffer stops until data is removed from the buffer at time t2, i.e., data input stops from time t1 to time t2.
Japanese Patent Laid-open Publication (kokai) 10-164592 proposes technology for resolving the above tasks 1 and 2. Kokai 10-164592 teaches a method for extracting a plurality of frames (scenes) from an MPEG video stream, and connecting a plurality of these scenes to produce a new video stream. The present explanation continues below referring to the technology disclosed in Kokai 10-164592 as prior art.
This conventional technology is described next below with reference to FIG. 21, a block diagram of a video editor according to the related art.
Referring to FIG. 21, data extractor 12 extracts the frame information for each scene from the bitstream 11. Using this frame information, a control point determining means 13 determines the frame (or group of frames) for which the code size is to change, and code size calculator 15 determines the code size (amount of data) to be allocated to the selected frame (or group of frames).
The bit rate controller 14 then codes the data using the code size thus allocated to this frame (or group) and links the data for the scene to generate a new bitstream 16.
More specifically, data extractor 12 sends the frame composition of each scene (that is, the picture types in the scene) to the control point determining means 13. Using this frame composition information sent from data extractor 12, control point determining means 13 determines whether there is a frame at the beginning or end of the scene that must be re-encoded in order to sustain the picture content at the beginning and end of the scene, and defines any such frame as a variable bit rate frame. That is, if a picture referenced to code a particular frame in the display order is not included in the group of frames constituting the scene, that particular frame is designated a variable bit rate frame to be re-encoded with a different code size. If there is a plurality of consecutive variable bit rate frames, these frames are treated as a variable bit rate frame group.
Operation of the code size calculator 15 is described next. The control point determining means 13 sends information about the variable bit rate frame (group) to code size calculator 15. The code size calculator 15 also gets from data extractor 12 such frame information as the bit rate of the bitstream, the buffer size, and the size of the frames in the scene or the frame vbv_delay value.
The original bitstream data is used for all frames other than the selected variable bit rate frame (group), and these frames are referred to as original data frames.
The code size calculator 15 also obtains, from the frame information passed from data extractor 12, the initial buffer fullness, final buffer fullness, and highest and lowest buffer fullness values for the buffer in the original data frame range.
From the initial buffer fullness, final buffer fullness, and highest and lowest buffer fullness values, the code size calculator 15 calculates the range in the newly generated bitstream in which these initial and final buffer fullness values of the original data range are possible. This range is calculated for all scenes.
The code size calculator 15 then determines the code size (target code size) allocated to all variable bit rate frames (group) so that the code size remains within this range.
Operation of the bit rate controller 14 is described next. The bit rate controller 14 re-encodes each of the variable bit rate frames to I-pictures based on the target code size allocated to each variable bit rate frame by code size calculator 15. It is also detected at this time whether coding to an I-picture is possible using the target code size. If not, a number of P-pictures with a difference of 0 is inserted, and the increased code size is added to the target code size of the variable bit rate frame. After thus re-encoding the variable bit rate frames, scene data is relinked to produce a new bitstream.
Problems to be Solved
The related art as described above does not resolve the following problems.
Problem 1
Buffer fullness is analyzed for every picture in a scene in order to calculate the code size of the re-encoded pictures. Depending on the scene length, this may require processing a large amount of data.
Problem 2
It may be necessary to insert some number of zero-difference P-pictures in order to avoid picture degradation, and the frame number may therefore change before and after editing. Frame numbers therefore cannot be indexed during editing, and editing is thus more difficult.
Problem 3
All re-encoded pictures are I-pictures. I-pictures consume a large amount of code. Coding efficiency thus drops.
Problem 4
Code size is evenly allocated to plural re-encoded pictures at the point the scenes are edited. If there are many re-encoded pictures and a small amount of code is allocated, picture degradation propagates and the quality of the entire video sequence drops.
Problem 5
Calculating the occupied buffer capacity around the point where a scene is edited is difficult with VBR coded MPEG data. Buffer control is therefore difficult.
The object of the present invention is therefore to provide technology resolving the first and second tasks described above as well as the above-noted problems 1 to 5.
To resolve the first and second tasks, and problems 1 and 2 above, a video editor according to the invention has a scene information input means for inputting scene information, where a scene is a plurality of consecutive frames extracted from an edit stream; a re-encoding target picture selector for selecting as target pictures for re-encoding the smallest number of pictures that must be re-encoded for the scene to be independently reproducible; a stream structure data generator for generating structure information for the stream range containing a target picture and an intra-coded picture referenced for coding the target picture; a buffer fullness calculating means for calculating, from stream structure data, buffer fullness, or buffer occupancy, at a target picture boundary to a recycled picture not requiring re-encoding; a re-encoding range code allocation calculator for calculating a code allocation to a re-encoding range based on buffer fullness and target picture count, said re-encoding range being one or a plurality of target pictures near an edit point between scenes; a re-encoding target picture target code size calculating means for calculating a code allocation to each target picture based on the re-encoding range code allocation, target picture count, and picture type after re-encoding; and a scene linking means for connecting scenes and producing a new stream.
To resolve problem 3 above, the video editor of the invention further preferably has a re-encoding target picture type determining means for deciding, from stream structure data, a target picture picture type after re-encoding; and a picture re-encoding means for target picture re-encoding based on target picture target code size and picture type after re-encoding.
To resolve problem 4 above, the video editor of the invention further preferably has a code allocation verifying means for verifying whether the code allocation is appropriate based on the re-encoding range code allocation and target picture picture type after re-encoding; and a re-encoding range expanding means for extending the re-encoding range if the code allocation is not appropriate.
Yet further preferably to resolve problem 4 above, the video editor of the invention preferably has a re-encoding target picture importance calculating means for calculating an importance rating for each target picture in the re-encoding range.
To resolve problem 5 above, the video editor of the invention further preferably has a buffer fullness analyzing means for analyzing buffer fullness change when the edit stream is variable bit rate coded and calculating buffer fullness is difficult.
Other objects and attainments together with a fuller understanding of the invention will become apparent and appreciated by referring to the following description and claims taken in conjunction with the accompanying drawings.