The present invention relates to a video compression and encoding technology such as an MPEG scheme utilized in a video transmission system and a video database system through the Internet or the like. The present invention particularly relates to a video encoding method and a video encoding apparatus capable of providing a unified decoded video for each scene, which is easy to see without increasing data size, by encoding data in accordance with encoded parameters based on the content of scenes.
The MPEG scheme, which is an international standard for video encoding, is a technique for compressing a video by a combination of motion compensation prediction, discrete cosine transformation and variable length coding, as is well known. The MPEG scheme is described in detail in, for example, Reference 1: xe2x80x9cMPEGxe2x80x9d, The Institute of Television Engineers edition, Ohmsha, Ltd.).
In a conventional video encoding apparatus based on the MPEG scheme, compressed video data is transmitted by a transmission line the transmission rate of which is specified, or recorded on a storage medium the recording capacity of which is limited. Owing to this, a processing referred to as rate control for setting encoding parameters, such as a frame rate and a quantization width, and conducting encoding so that the bit rate of an outputted encoded bit stream can become a designated value. In conventional rate control, a method of determining a frame rate according to the number of generated bits as a result of encoding a previous frame with respect to a fixed quantization width has been often adopted.
Conventionally, a frame rate is determined based on the difference (margin) between a present buffer capacity and a frame skip threshold preset according to the capacity of a buffer in which an encoded bit stream is temporarily stored. If the buffer capacity is lower than the threshold, data is encoded at a fixed frame rate. If the buffer capacity is higher than the threshold, frame skipping is conducted to decrease the frame rate.
With this method, however, if the number of coded bits generated in a previous frame is large, frame skipping is conducted until the buffer capacity becomes not more than the frame skip threshold. Due to this, the distance between the frame and the next frame becomes too wide, with the result that video disadvantageously becomes unnatural.
That is, according to the conventional rate control, the frame rate and the quantization width are basically set irrespectively of the content of a video. For that reason, frame rate become too low on a scene in the video on which an object moves actively and the motion of the object becomes unnatural. Besides, due to the inappropriate quantization width, the picture may be distorted to thereby disadvantageously find it difficult to visually recognize the picture.
In the meantime, there is also known a rate control method based on a technique referred to as two-pass encoding. This technique is described in, for example, Reference 2: Japanese Patent Unexamined Application Publication No. 10-336675. As described in Reference 2, a video file is encoded twice, the overall characteristics of the video file is analyzed by the first encoding, the second encoding is conducted by setting appropriate encoding parameters based on the analysis result and an encoded bit stream obtained as a result of the second encoding is transmitted or recorded. The two-pass encoding, however, has the same problems as those described above since encoding parameters are conventionally, basically set irrespectively of the contents of a video.
As stated above, in the conventional video encoding apparatus, encoding parameters such as the frame rate and the quantization width are set irrespectively of the contents of a video when conducting rate control. Due to this, the frame rate suddenly decreases on a scene in the video on which an object moves actively and the motion of the object becomes unnatural. Also, due to the inappropriate quantization width, the video may be distorted. Thus, the conventional video encoding apparatus has a disadvantage in that the deterioration of picture quality tends to be conspicuous.
It is, therefore, an object of the present invention to provide a video encoding method and a video encoding apparatus capable of encoding a video with picture quality suited for the contents of the scenes of the video while maintaining an encoding bit rate at a designated value.
If compressed video data is recorded on a storage medium of limited storage capacity or downloaded through the Internet, it is important to efficiently encode the data at a frame rate or with a quantization width suitable for a scene as much as possible on the condition of fixed data size. To this end, since the number of generated bits does not always relate to the content of the scene, it is desired that encoded parameters are determined based on the motion of an object on the scene and the content of the scene so as to obtain a clear video.
The present invention provides a video encoding method and a video encoding apparatus for dividing an input video signal into a plurality of temporally continuous scenes each constituted by at least one frame, calculating statistical feature amounts for each scene, and encoding the input video signal using the encoded parameters.
Here, the statistical feature amounts are calculated by totaling the sizes and the distribution of motion vectors existing in each frame of the input signal for each scene. The encoded parameters include, for example, at least a frame rate and a quantization width.
In addition to totaling, as the statistical feature amounts, the sizes and distribution of the motion vectors existing in each frame for each scene, frames may be classified into types from the sizes and distribution of the motion vectors and based on the motion of a camera used when obtaining the input video signal and the motion of an object in a video to thereby classify the scenes according to the types of the frames, and the encoded parameters may be generated in view of the classification of the scenes.
If quantization widths in units of macro-blocks are generated as the encoded parameters, the quantization width of a macro-block, among the macro-blocks in a to-be-encoded frame, having the variance of luminance different from the variance of luminance of an adjacent macro-block by not less than a predetermined value and the quantization width of a macro-block in which the edge of an object exists may be made relatively small compared with the quantization widths of the other macro-blocks.
In this way, according to the present invention, the encoded parameters used for encoding the input video signal are generated for each scene based on the statistical feature amounts calculated for each scene of the input video signal, thereby making it possible to prevent the frame rate from decreasing when the motion of the object or that of the camera is active and the visual quality of a decoded video from deteriorating.
Further, the video feature amounts based on the motion of the object in the video, the motion of the camera and the like are reflected on the encoded parameters, based on which the frame rate is changed or the quantization width is changed for each macro-block, thereby making it possible to obtain a good decoded video unified for each scene even with the same number of generated bits.
Moreover, the present invention is applicable to a video encoding apparatus for encoding an input video signal of the same video file twice or more. That is, based on the statistical feature amounts calculated for each scene of the input video signal, the input video signal is encoded using the first encoded parameters generated for each scene to thereby determine whether the number of generated bits of a code string generated as a result of the first encoding exceeds or falls short of the target number of bits, the first encoded parameters are corrected based on the determination result to thereby provide second encoded parameters, the second encoding is conducted to the input video signal using the second encoded parameters to thereby generate a code string, and the code string is outputted as an encoded output.
As can be seen, the encoded parameters generated as stated above are corrected while always monitoring the number of generated bits, and encoding is repeated twice or more, whereby it is possible to realize encoding capable of obtaining a good decoded video with data size not more than the target number of bits.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.