1. Field of the Invention
The present invention relates to an apparatus for video signal conversion, and to a corresponding video signal conversion method, for converting a compressed digital video signal into a compressed digital video signal in a different compression format. In particular, the invention relates to a method and apparatus whereby motion flags which are contained in the compressed digital video signal prior to conversion are effectively utilized for converting the video signal.
2. Description of the Prior Art
In recent years, with the increasing popularity of multimedia, considerable research has been executed in various directions for video on-demand systems, whereby video images can be viewed from a television receiver or a personal computer whenever desired.
A group which includes the assignees of the present invention has developed and put into practical application a system, for use as a video on-demand system, whereby images which are acquired by a digital video camera are edited and then converted to a compressed format digital video signal, to be distributed to personal computers etc.
In general, a digital video camera records a compressed digital video signal in the DV format, which is a standard that has been established for digital video equipment. The DV format was established as a standard in 1996, for application to video cassette recorders, based on the xe2x80x9cSpecifications of Consumer-Use Digital VCRsxe2x80x9d (HD Digital VCR Conference, 1996), whereby image compression is achieved by a combination of DCT (Discrete Cosine Transform) processing to reduce spatial image redundancy within each frame of a digital video signal and variable-length encoding to reduce code redundancy.
With video data in accordance with the DV format, as shown in diagram (a) of FIG. 12, one macroblock of a video signal frame consists of four luminance signal blocks which are arrayed along the horizontal direction as an elongated rectangular array, with each luminance signal block consisting of an array of 8xc3x978 pixel values, and two color difference signal blocks (CR, CB) which, in a finally displayed picture each correspond in size and position to the set of four luminance signal blocks, with this arrangement of pixel values within a frame being referred to as the 4:1:1 color component format. Also, as shown in diagram (b) of FIG. 12, 27 of these macroblocks constitute a super block, with a 5xc3x9710 set of the super blocks constituting one complete frame of the digital video signal.
A DV format video camera outputs video information in units of interlaced fields, with a {fraction (1/60)} second period between successive fields. When the amount of image motion within a frame is small, then each frame is formed by combining two successive fields, so that the frame period is {fraction (1/30)} second, and DCT processing is applied to each of such frames as an interlaced combination of two fields, with such processing being referred to as interlaced-field mode DCT processing. On the other hand, when the image motion within a frame is large, then the frame is processed as two successive fields, i.e., DCT processing is separately applied to each of the two fields of that frame, with such processing being referred to in the following as as progressive-field mode DCT processing. Since it is possible that image motion may occur within only a limited region, the selection of frame DCT mode or field DCT mode is executed adaptively in units of blocks of a frame. When DCT processing is applied to each of the four luminance signal blocks and two color difference signal blocks of a macroblock, respective motion flags corresponding to these six blocks are inserted into the code which is generated by compressing the digital video signal, with these motion vectors respectively indicating for each block whether field DCT mode or frame DCT mode has been applied to that block. A motion flag takes the logic value xe2x80x9c1xe2x80x9d if the amount of motion detected for the corresponding block is large, so that progressive-field mode DCT processing has been assigned to the block, and takes the value xe2x80x9c0xe2x80x9d if the amount of motion detected for the corresponding block is small, so that interlaced-frame mode DCT processing has been assigned to the block. These motion flags are subsequently referred to when decoding the compressed DV format digital video signal.
In order to distribute digital video data that has been compressed in accordance with the DV standard, to personal computers, etc., it is necessary to convert the video data to the MPEG-1 or MPEG-2 compressed code format. This conversion is generally executed by decoding the DV standard video data to recover a non-compressed video signal consisting of successive frames, and then applying compression processing in accordance with the MPEG standard to the non-compressed digital video signal.
The MPEG-1 or MPEG-2 compression standards are widely applied to video signals which are to be processed by personal computers. Each of these is a standard whereby spatial image redundancy within each frame is reduced by applying DCT transform processing, then applying variable-length encoding to reduce code redundancy. In addition, inter-frame redundancy is reduced by applying motion compensation. For that reason, the amount of code which is generated by MPEG compression encoding is reduced to ⅙ of the amount of that is generated by DV compression encoding, so that the MPEG code can easily be transmitted via a network.
MPEG-1 is described in detail in IOS/IEC 11172-2 xe2x80x9cInformation technologyxe2x80x94Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/sxe2x80x94Part 2: Videoxe2x80x9d, while MPEG-2 is described in IOS/IEC 13818-2 xe2x80x9cInformation technologyxe2x80x94Generic coding of moving pictures and associated audio informationxe2x80x94Part 2: Videoxe2x80x9d
With MPEG format video data, as shown in diagram (c) of FIG. 13, one macroblock is formed of four luminance signal blocks (each formed of 8xc3x978 pixel values) arranged in a square array, and two color difference signal blocks (CR, CB) which correspond in position to the set of four luminance signal blocks, with this being referred to as the 4:2:0 color component format. In terms of respective amounts of data, since each color difference signal block consists of 8xc3x978 pixel values it is equivalent to one luminance signal block, however in terms of a finally displayed picture (after interpolation of color difference values), each color difference signal block of a macroblock corresponds in size and position to the set of four luminance signal blocks of that macroblock. A set of macroblocks arrayed along the horizontal scanning direction of a frame constitutes one slice, as shown in diagram (b), with a plurality of slices constituting one picture, as shown in diagram (a).
With MPEG encoding, intra-coding (i.e. direct conversion to sets of DCT coefficients) or inter-coding (i.e., motion prediction encoding, and conversion of resultant prediction error values to DCT coefficients) of the digital video signal may be applied.
With the MPEG-1 method, encoding in units of fields is not executed. This enables the processing speed to be high, however since a period of {fraction (1/60)} second occurs between the times at which the respective images of two successive fields are captured by a video camera, when these two fields are combined into a single field-interlaced frame and such a frame is directly encoded, deterioration of the resultant reproduced image quality will occur whenever there is rapid motion within the image expressed by a frame.
This problem is reduced with the MPEG-2 encoding method. In that case, progressive-field mode encoding or interlaced-field mode encoding can be adaptively selected. With progressive-field mode encoding applied to an entire frame, the two fields constituting the frame are separated, and the entire contents of each field are separately encoded (by intra-coding or inter-coding). Alternatively, only certain macroblocks of the frame can be adaptively selected to be encoded in progressive-field mode.
However progressive-field mode processing results in a greater amount of code being generated than is generated with interlaced-field mode encoding, so that the processing speed becomes lower. For that reason, with MPEG-2, progressive-field mode encoding is adaptively applied only to image regions in which there is a large amount of motion, with emphasis being placed on minimizing the amount of processing which must be performed, consistent with satisfactory image quality.
A prior art method of selecting the image regions to which progressive-field mode is to be applied is described for example in Japanese patent HEI 8-46971. With that method, a single non-compressed frame is separated into its two fields, and motion prediction is then applied between respective macroblocks of the first and second field images. Motion vectors are thereby obtained for the respective macroblocks, and depending upon the magnitudes of the corresponding motion vectors, either progressive-field mode or interlaced-field mode encoding is adaptively assigned to the macroblocks.
Furthermore, as described in Japanese patent HEI 7-322268, the motion vectors which are thereby detected between two successive fields of a frame for the purpose of field or interlaced-field mode encoding determination, can be used in calculating inter-frame motion vectors, in order to achieve an improvement in encoding efficiency.
However with such types of mode selection method, in order to adaptively select progressive-field mode or interlaced-field mode encoding for a frame, it is necessary to perform motion prediction over the entire frame. For that reason, the amount of processing which is required becomes considerable, and the processing time is accordingly increased. If such processing were to be executed by using dedicated hardware, then the necessary circuit size would necessarily be increased.
It is an objective of the present invention to overcome the above problems by providing a video signal conversion apparatus whereby a digital video signal that has been encoded by DV format compression is converted to a digital video signal encoded by MPEG format compression, and whereby motion flags which are contained in the DV format data are used to achieve efficient conversion to the MPEG format code, and further to provide a corresponding video signal conversion method.
It should be understood that, unless otherwise indicated, the term xe2x80x9cmacroblockxe2x80x9d as used in the following description and in the appended claims is intended to signify an MPEG format macroblock.
To achieve the above objectives the present invention provides a video signal conversion apparatus and corresponding method whereby, after decoding of video data which has been encoded in the DV format, the resultant non-compressed digital video signal is encoded as MPEG format data, with motion flag information expressed by motion flags contained in the DV format data being used to select between different MPEG encoding modes, in one or more of the following ways. Firstly, the motion flag information may be used to adaptively select interlaced-field mode DCT processing or progressive-field mode DCT processing in units of macroblocks. That is to say, the apparatus judges based upon the motion flag information corresponding to a macroblock of an interlaced-field frame of the input non-compressed video signal whether or not there is a significant amount of motion associated with that macroblock. If it is judged that there is not a significant amount of motion, indicating that the macroblock contents have not change significantly between the two successive fields of the frame, then interlaced-field mode DCT processing is assigned to that macroblock, i.e. the macroblock is directly DCT-processed. If however is judged that there is a significant amount of motion, then progressive-field mode DCT processing is assigned for the macroblock, i.e. the two interlaced portions of that macroblock which are contained in the first and second fields of the frame respectively, are DCT-processed mutually separately.
Secondly, the motion flag information may be used to adaptively select interlaced-field mode motion prediction processing or progressive-field mode motion prediction processing in units of macroblocks, i.e. for each macroblock which is to be inter-coded. In this case, if it is judged based on the corresponding motion flag information that there is not a significant amount of motion associated with the macroblock, then interlaced-field mode motion prediction processing is assigned to that macroblock, i.e. the macroblock is directly subjected to motion prediction processing by comparison with successive macroblocks of a reference frame, within a predetermined search range. If however is judged that there is a significant amount of motion, then progressive-field mode motion prediction processing is assigned for the macroblock. In that case, the portion of the macroblock contained in the first field of the frame is compared with the first field of the reference frame, to obtain a first set of prediction error values and first motion flag, then the same process is performed for the portion of the macroblock contained in the second field. In that case, two motion vectors and two sets of prediction error values are derived for the macroblock.
Thirdly, the motion flag information may be used to adaptively select interlaced-field mode MPEG processing or progressive-field mode MPEG processing in units of pictures, i.e. entire frames or fields. Specifically a frame which is to be encoded is judged, based on the motion flag information that is provided by the entire set of motion flags corresponding to that frame, as to whether the frame contains a significant degree of image motion. If no such degree of motion is found, then the frame is subjected to the usual form of interlaced-field mode MPEG processing. If significant motion is found, then the frame is separated into its first and second constituent fields, which are then successively subjected to independent MPEG encoding.
Fourthly, the motion flag information may be used to adaptively select the search range which is used in motion prediction processing of each of respective macroblocks which are to be subjected to inter-coding. For example, changeover between a relatively wide search range within a reference frame and a relatively narrow range can be executed based upon a judgement as to whether the states of the set of motion flags corresponding to that macroblock indicate a significant degree of motion for the macroblock. If the amount of motion is sufficiently small, then accurate motion prediction can be achieved by using the narrow search range, thereby achieving a substantial reduction in the amount of processing which must be executed. If a large amount of motion is indicated for that macroblock by the motion flag data, then the wide search range is selected. It can thereby be ensured that sufficient accuracy of motion prediction is maintained while minimizing the amount of processing that must be executed to perform motion prediction processing.
More specifically, the present invention provides a video signal conversion apparatus for converting DV encoded video data to MPEG encoded video data, the DV data including motion flag data which specify for each of respective video data blocks whether interlaced-field mode DCT processing or progressive-field DCT processing has been applied in encoding that block, the apparatus comprising:
video decoding means for decoding the DV encoded video data and for extracting the motion flag data from the DV encoded video data to obtain decoded video data formed of a stream of interlaced-field frames, and
video encoding means coupled to receive the decoded video data and the motion flag data, for executing MPEG format encoding of the decoded video data,
wherein the video encoding means includes processing mode selection means responsive to the motion flag data for adaptively selecting a mode of the MPEG format encoding, based upon the motion flag data.
The video encoding means includes DCT processing means, and the processing mode selection means can comprise means for selectively designating progressive-field mode DCT (Discrete Cosine Transform) processing by the DCT processing means for a macroblock extracted from a frame of the decoded video data when it is judged from the motion flag data that the macroblock exhibits a relatively large amount of motion and designating interlaced-field mode DCT processing for the macroblock when it is judged from the motion flag data that the macroblock exhibits a relatively small amount of motion Specifically, the processing mode selection means can comprise:
macroblock extraction means coupled to receive the decoded video data, for extracting successive macroblocks from each of the frames,
interlaced-field mode DCT block extraction means for receiving and directly outputting each macroblock which is extracted by the macroblock extraction means
progressive-field mode DCT block extraction means for receiving each macroblock which is extracted by the macroblock extraction means, separating each of the macroblocks into two half-macroblocks which are contained in a first field and in a second field of the each frame, respectively, and successively outputting the half-macroblocks, and
means controlled by the processing mode selection means for selecting each of the macroblocks of the decoded video data to be transferred directly by the interlaced-field mode DCT block extraction means to the DCT processing means for application of interlaced-field mode DCT processing when it is judged from the motion flag data that the macroblock exhibits a relatively small amount of motion, or to be transferred as successive half-blocks from the progressive-field mode DCT block extraction means to the DCT processing means for application of progressive-field mode DCT processing, when it is judged from the motion flag data that the macroblock exhibits a relatively large amount of motion.
In addition, the video encoding means of such a video signal conversion apparatus includes motion prediction processing means, and the processing mode selection means can comprise means for selectively applying progressive-field mode motion prediction processing by the motion prediction means to a macroblock extracted from a frame of the decoded video data when it is judged from the motion flag data that the macroblock exhibits a relatively large amount of motion and for applying interlaced-field mode motion prediction processing to the macroblock when it is judged from the motion flag data that the macroblock exhibits a relatively small amount of motion.
Such a video encoding means includes video memory means for storing reconstructed video data frames which have been reconstructed from encoded video data, motion search means and prediction error derivation means, and the processing mode selection means can comprise:
interlaced-field mode reference picture extraction means for obtaining from the video memory means a reconstructed frame for use as a reference frame, and for directly outputting the reference frame,
progressive-field mode reference picture extraction means for obtaining from the video memory means a reconstructed frame for use as a reference frame, and for separating the frame into first and second fields, and sequentially outputting the first and second fields,
interlaced-field mode object macroblock extraction means for directly extracting successive macroblocks from a frame of the decoded video data, and for outputting the directly extracted macroblocks,
progressive-field mode object macroblock extraction means for extracting successive macroblocks from a frame of the decoded video data, and for separating each of the macroblocks into a first half-macroblock which is contained in a first field of the frame and a second half-macroblock which is contained in a second field of the frame, and for successively outputting the first and second half-macroblocks, and
prediction mode control means for selecting a directly output reference frame which is produced from the interlaced-field mode reference picture extraction means and a directly extracted macroblock which is produced from the interlaced-frame mode object macroblock extraction means, to be supplied to the motion search means and the prediction error derivation means for applying interlaced-field mode motion prediction processing to the macroblock, when it is judged from the motion flag data that the macroblock exhibits a relatively small amount of motion, and for selecting a successive pair of fields of a reference frame which are produced from the progressive-field mode reference picture extraction means and a successive pair of half-blocks of an extracted macroblock which are produced from the field mode object macroblock extraction means, to be supplied to the motion search means and the prediction error derivation means for applying motion prediction processing to the macroblock, when it is judged from the motion flag data that the macroblock exhibits a relatively large amount of motion.
From another aspect, the mode selection means of such a video signal conversion apparatus can be configured to select, for each of successive frames of the decoded video data, interlaced-field mode DCT processing and motion prediction processing when it is judged from the motion flag data that the image contents of the frame exhibit a large amount of motion and progressive-field mode DCT processing and motion prediction processing when it is judged from the motion flag data that the image contents of the frame exhibit a small amount of motion.
In that case, the mode selection means can comprise picture formation means coupled to receive the decoded video data and controlled by the processing mode selection means for outputting each interlaced-field frame of the decoded video data unchanged, to be subjected to interlaced-field mode DCT processing and motion prediction processing, when it is judged from the motion flag data that the interlaced-field frame exhibits a small amount of motion and for outputting the each interlaced-field frame as two consecutive fields, to be subjected to progressive-field mode DCT processing and motion prediction processing, when it is judged from the motion flag data that the interlaced-field frame exhibits a relatively large amount of motion.
According to another aspect, the video encoding means includes motion prediction processing means having motion search means for comparing a macroblock extracted from a frame of the decoded video data with successive macroblocks of a reference frame within a specific search range in the reference frame, and the processing mode selection means comprises search range control means for operating on the motion search means to set the search range as a narrow range when it is judged from the motion flag data that the macroblock exhibits a relatively small amount of motion and to set the search range as a wide range when it is judged from the motion flag data that the macroblock exhibits a relatively large amount of motion.
According to a further aspect, the invention provides a video signal conversion apparatus wherein the video encoding means includes means for applying intra-coding and means for applying inter-coding to respective video data macroblocks, and wherein the processing mode selection means comprises means for transferring a macroblock of an interlaced-field frame of the decoded video data to be subjected to intra-coding when it is judged from the motion flag data that the macroblock exhibits a relatively large amount of motion, and for transferring the macroblock to be subjected to inter-coding when it is judged from the motion flag data that the macroblock exhibits a relatively small amount of motion.
With a video signal conversion apparatus according to the present invention, the processing mode selection means can judge whether a macroblock exhibits a large or a small degree of motion based upon the states of respective motion flags of luminance signal blocks of the macroblock. Alternatively, the processing selection means can judge that a macroblock exhibits a large amount of motion when at least a predetermined number of respective motion flags of luminance signal blocks of the macroblock each indicate a condition of large amount of motion. As a further alternative, the processing mode selection means can judge that a macroblock exhibits a small amount of motion when at least one of two color difference signal blocks in the macroblock indicates a condition of small amount of motion. And as yet another possible standard for judgement, the processing mode selection means can judge that a macroblock exhibits a large amount of motion when at least a predetermined number of respective motion flags of four luminance signal blocks of the macroblock and also at least one of two color difference signal blocks of the macroblock each indicates a condition of large amount of motion.
The invention further provides a method of converting DV encoded video data to MPEG encoded video data, the DV data consisting of digital video data which have been encoded in a DV format and including motion flag data which specify for each of respective video data blocks a DCT processing mode which has been applied in DV encoding of the block, whether interlaced-field mode DCT processing or progressive-field DCT processing has been applied in encoding that block, the method comprising:
decoding the DV encoded video data to obtain decoded video data formed of a stream of interlaced-field frames, and extracting the motion flag data from the DV encoded video data, and
executing MPEG format encoding of the decoded video data, by adaptively selecting a mode of the MPEG format encoding, based upon a judgement of the motion flag data.