1. Field of Application
The present invention relates to a compressed data processing method and compressed data processing apparatus, and to a recording and playback system for compressed data, and in particular to such a method, apparatus and system for application to MPEG-encoded compressed data whereby a stream of pictures expressed by the data can be converted to an output picture stream having a reduced frequency of picture updating.
2. Description of Prior Art
Digital technology is widely applied at present in the fields of computers, broadcasting systems, communication systems, data storage systems, etc. A set of international standards for data compression known as MPEG (Moving Pictures Experts Group) has become an important part of such technology. Since the present invention utilizes the MPEG standards, the basic concepts of these will first be outlined. The first MPEG standards for compression of video data were developed by a joint committee known as ISO/IEC JTC1/SC2 of the ISO (International Standards Organization) and IEC (International Electrotechnical Commission) in 1988, where SC2 signifies xe2x80x9cscientific sub-committee 2xe2x80x9d, later changed to SC29.
There are two sets of MPEG standards, MPEG-1 and MPEG-2. MPEG-1 (signifying xe2x80x9cMPEG phase 1xe2x80x9d) is applicable to storage media etc., for transferring data at a rate of approximately 1.5 Mbps. MPEG-1 was developed by applying new technologies to existing types of picture encoding methods, specifically to the JPEG standard which is used for compression-encoding of still pictures, and the H.261 technology (specified by CCIT SGXV standards, now called the ITU-T SG15 standards), developed for compression of pictures in order to transmit the pictures at a low rate of data transfer in such applications as teleconferencing, video telephones, etc. with transmission via a ISDN network. The MPEG-1 standards were first published in 1993, as ISO/IEC 11172. MPEG-2 can be considered as an extension of MPEG-1, and was developed for applications such as data communications, broadcasting, etc., providing features which are not available with MPEG-1 such as an enhanced capability for compression encoding of interlaced-field video signals. The MPEG-2 standards were first published in 1994, as ISO/IEC 1318, H.262. Although embodiments of the invention will be described basically on the assumption of MPEG-1 (referred to in the following simply as MPEG) processing, it will be apparent to a skilled person that the techniques described can be readily adapted to MPEG-2 processing.
FIG. 19 is a general system block diagram showing an example of a basic configuration of an MPEG encoder. The operation will be described first for the case of forward prediction, i.e., deriving encoded data expressing a current input picture based upon the contents of a preceding reference picture, and considering only luminance values. In FIG. 19, data expressing successive ones of a stream of pictures are input to the decoder. Specifically, successive input pixel values of an input picture that is expressed as an array of pixels (for example, one frame of a progressive-scan video signal) are supplied to an adder 2 and a motion compensated prediction section 1. The input picture is pre-processed (in some manner that is not indicated in the drawing) to extract successive 16xc3x9716 pixel blocks which are referred to as macroblocks, with the pixel values of the currently extracted macroblock being supplied to the adder 2 and motion compensated prediction section 1. A picture memory 11 holds (as described hereinafter) a set of pixel values expressing a reference picture for use in processing a predictively encoded picture, or may hold a pair of reference pictures which respectively precede and succeed the a predictively encoded picture in the case of bidirectional encoding. With predictive encoding, the motion compensated prediction section 1 successively shifts the input macroblock with respect to the reference picture, within a predetermined search range, to determine whether there is a 16xc3x9716 array of pixels within the reference picture which has at least a predetermined minimum degree of correlation with the input macroblock. If such a condition is detected, then the amount and direction of displacement between that 16xc3x9716 pixel array in the reference picture and the input macroblock is obtained, as a vector value referred to as a motion vector (specifically, a combination of a horizontal and a vertical motion vector). The respective values of difference between the pixel values (i.e., luminance and chrominance values) of the input macroblock and the correspond pixels within that 16xc3x9716 array of pixels in the reference picture (read out from the picture memory 11 and supplied via the motion compensated prediction section 1) are then derived by the adder 2, and supplied to the DCT transform section 3, with these values being referred to as motion compensated prediction error values in the following. Prediction from a preceding reference picture process is referred to as forward prediction, and from a succeeding reference picture is referred to as backward prediction. If no correlated 16xc3x9716 block is found within the search range, then the input macroblock is intra-coded within the input picture, i.e., as an intra-coded block, generally referred to as an I-block.
With bidirectional prediction, values for the input macroblock are predicted based on two 16xc3x9716 blocks of pixels within a preceding and a succeeding reference picture respectively.
In MPEG, the basic unit for which different types of encoding can be specified is the macroblock. Depending upon the type of picture in which it is located and on decisions made by the encoder, a macroblock may be:
(a) encoded entirely within a picture (i.e., intra-coded), independently of all other pictures,
(b) encoded by forward prediction, i.e., as a set of prediction error values in conjunction with a motion vector, derived using a preceding reference picture,
(b) encoded by backward prediction, i.e., as a set of prediction error values in conjunction with a motion vector, derived using a succeeding reference picture, or
(c) encoded by bidirectional prediction, using both a preceding and a succeeding reference picture.
A picture can be encoded as:
(a) an I-picture, in which case all of the macroblocks are I-macroblocks, i.e., are intra-coded within that picture,
(b) a P-picture, in which case the encoder can selectively apply intra-coding or forward prediction encoding to the macroblocks, or
(c) a B-picture, in which case the encoder can selectively apply intra-coding, forward prediction encoding, backward prediction encoding, or bidirectional prediction to the macroblocks.
To minimize the amount of generated encoded data, the encoder uses an algorithm which is designed to minimize the number of I-macroblocks of the P-pictures and B-pictures.
I-pictures and P-pictures are used as reference pictures, however B-pictures are not so used.
Successive ones of the stream of pictures supplied to the MPEG encoder are encoded as I, P or B-pictures, in a fixedly predetermined sequence. As a picture is encoded, the motion vectors derived for macroblocks are supplied from the motion compensated prediction section 1 to the VLC section 5, as is also prediction mode information which specifies the macroblock type, i.e., whether that macroblock has been encoded by intra-coding, forward prediction, backward prediction, or bidirectional prediction.
The motion compensated prediction error values derived from the adder 2 for a macroblock of the input picture are supplied to a DCT transform section 3, which processes the macroblock as a set of four 8xc3x978 pixel blocks, sometimes referred to as DCT blocks. 2-dimensional DCT (Discrete Cosine Transform) processing is separately applied to each of these DCT blocks to obtain a corresponding set of DCT coefficients, which are supplied to a quantizer 4. This form of processing is efficient, due to the fact that a video signal contains relatively large amounts of low-frequency components and relatively small amounts of high-frequency components, and the low-frequency components can be expressed by the DCT coefficients as relatively small amounts of data.
The quantizer 4 utilizes a 2-dimensional (8xc3x978 value) quantization matrix that is weighted in accordance with human visual characteristics, in conjunction with a quantization scaling value which is applied overall as a scalar multiplier, to obtain a matrix of quantization factors. Each of the DCT coefficients of a DCT block is divided by the corresponding quantization factor, to thereby convert each DCT block to a set of quantized DCT coefficients.
The quantized DCT coefficients produced from the quantizer 4 are supplied to a VLC section 5, and, in the case of an I-picture or P-picture, are supplied to a dequantizer 8, for use in generating a reference picture to be held in the picture memory 11. That is to say, the resultant dequantized DCT coefficients obtained from the dequantizer 8 are supplied to an inverse DCT transform section 9, and each of the resultant recovered motion compensated prediction error values thereby produced from the inverse DCT transform section 9 are added to the corresponding motion-compensated pixel value, produced from the motion compensated prediction section 1, to thereby recover each of the pixel values of that I-picture or P-pictures, which are then stored in the picture memory 11 as a reference picture.
The VLC section 5 applies DPCM (differential pulse code modulation) to the DCT coefficient of a DCT block, which expresses the DC component of the luminance values of that block, while the DCT coefficients expressing the AC components of that DCT block are subjected to zig-zag scanning to enhance the probability of obtaining consecutive sequences (xe2x80x9crunsxe2x80x9d) of zero values, and run-length encoding whereby each of such runs of consecutive zero values can be expressed by a single value, thereby achieving highly efficient encoding. Entropy encoding (typically, Huffman encoding) is then applied, and the resultant variable-length encoded (VLE) data are supplied to a buffer 6, and are produced from that buffer at a constant data rate. The buffer 6 includes a function for detecting the respective amounts of data expressing each of successive macroblocks, and supplies that information to a code amount control section 7. The code amount control section 7 determines the difference between a target amount of code and the actual amount of code used to encode each macroblock, and generates a corresponding control value which is fed back to the quantizer 4, to adjust the quantization scale value that is used by the quantizer 4, such as to ensure that the rate of supplying data to the buffer 6 will not result in buffer underflow or overflow. It can thus be understood that the amounts of data used to encode respective pixels are not constant, but vary substantially as a result of the various encoding operations described above, so that the measures described above are necessary to ensure that underflow or overflow will not occur in the output buffer of the MPEG encoder or in the input buffer of the MPEG decoder.
FIG. 20 is a general system block diagram showing an example of a MPEG decoder for operating on MPEG-compressed video data. In FIG. 20, the input MPEG encoded data are subjected to decoding by a VLD (variable-length decoder) 15, and the resultant data are processed by a dequantizer 16 such as to recover values which are close approximations to the originally derived DCT coefficients, and inverse DCT processing is then applied to these by an inverse DCT section 17. In addition, the motion vector information and prediction mode information for each macroblock are extracted by the VLD 15 from the decoded input data stream, and supplied to a motion compensated prediction section 18. As the data for an I-picture or P-picture are recovered by the decoder, they are successively stored in a picture memory 20, to form a reference picture, whose data are also supplied to the motion compensated prediction section 18. As the recovered motion compensated prediction error value for a pixel of a macroblock is produced from the inverse DCT section 17, then (in the case of forward prediction or backward prediction) it is added to the value of the corresponding pixel from the reference picture that is currently held in the picture memory 20, after motion compensation has been applied to that reference picture by the motion compensated prediction section 18, with the amount of motion compensation being determined by the motion vector for the macroblock that is currently being processed. In that way, successive macroblocks of each of successive P and B-pictures are recovered from the MPEG-encoded compressed data.
An MPEG picture can be encoded as one or more sets of macroblock, referred to as slices. In the simplest case only a single slice is utilized, i.e., constituting all of the macroblocks of a picture.
The output generated by an MPEG encoder is an ordered continuous stream of bits, consisting of successive bit patterns and code values, with sets of stuffing bits inserted where necessary. A multi-layer configuration is utilized, in which successive layers convey information ranging from indications of the start and end points of the MPEG-encoded data stream down to the sets of quantized encoded DCT coefficient values for the respective blocks of macroblocks of a picture. The highest layer is the video sequence layer, containing bit patterns for indicating the aforementioned start and end points of the MPEG data stream, and containing a succession of sets of information relating to respective GOPs (xe2x80x9cgroup of picturesxe2x80x9d units), constituting a GOP layer. The term xe2x80x9cgroup of picturesxe2x80x9d refers to a sequence consisting of an I-picture followed by a combination of B-pictures and P-pictures, with a typical GOP arrangement being illustrated in FIG. 3. Here, numeral 36 denotes an I-picture at the start of a GOP set which is formed of 12 successive pictures as shown, in the sequence I, B, B, P, B, B, P, B, B, P, B, B, with the P-pictures designated as 37, 38, 39 respectively. The distance (in picture units) M between each pair of reference pictures (I- or P-pictures) is 3, while the length N of the GOP set is 12. Each set of picture layer information contains information relating to each of the slices of that picture, constituting a slice layer, and the slice layer information for each specific slice contains information relating to all of the macroblocks of that slice, as a macroblock layer. Each portion of the macroblock layer relating to a specific macroblock contains encoded DCT coefficients specifying the luminance and chrominance values of the blocks which constitute the macroblock, either directly or as prediction error values.
However if a macroblock has been judged to be identical to the correspondingly positioned macroblock of a reference picture at the time of encoding, then no information is actually encoded for that macroblock, which is referred to as a skipped macroblock. Specifically, a macroblock is indicated as being xe2x80x9cskippedxe2x80x9d, in the MPEG data, by omitting to specify an incremental address value (or any other information) for that macroblock. As a result, referring to the decoder example of FIG. 20, at the time when decoding of such a skipped macroblock is executed, the chrominance and luminance values for the correspondingly positioned macroblock of the reference picture will be read out from the picture memory 20 and transferred unchanged via the motion compensated prediction section 18 and the adder 19 to the output of the decoder.
With such an MPEG system, it is difficult to modify the MPEG-encoded compressed data such as to produce various special effects in the final display picture that is generated from the decoded video data. Examples of such special effects are a xe2x80x9ctime lapsexe2x80x9d effect, i.e., whereby the displayed picture becomes a succession of still pictures rather than a moving picture, so that a form of slow-motion display is achieved, or the xe2x80x9cwipexe2x80x9d effect, whereby the displayed picture is gradually shifted off of the display screen. To achieve the xe2x80x9ctime lapsexe2x80x9d special effect in the prior art, it has been necessary to use some dedicated form of special apparatus to process the MPEG-encoded compressed video data prior to supplying the data to an MPEG decoder, i.e., an apparatus having a decoder section for decoding the MPEG data stream, a section for applying processing to the resultant decompressed video data such as to produce the desired xe2x80x9ctime lapsexe2x80x9d effect, and an MPEG compression section for then again applying MPEG-encoding to the resultant data. The resultant MPEG-encoded compressed data can thereafter be decoded by a conventional type of MPEG decoder as described above. To achieve the xe2x80x9cwipexe2x80x9d special effect, it has been necessary in the prior art to use a special type of MPEG encoder which has been designed to enable that special effect to be obtained, and to subsequently perform decoding of the resultant MPEG-encoded compressed data using a conventional MPEG decoder.
However such prior art methods of achieving these types of special effect in a finally displayed picture have the disadvantages of causing an increase in the overall system size and complexity, with resultant increases in system costs, operational complexity, etc. There is therefore a need for some simple type of apparatus for achieving such special effects, which could be easily incorporated into an existing MPEG system.
Furthermore, there are many cases in which it would be highly advantageous to convert MPEG-encoded compressed data into an even more highly compressed condition. For example, when a number of entertainment program items (such as respective films, cartoons, etc.) are successively stored by a data recording and playback apparatus on a recording medium, in the form of respective sets of MPEG-encoded compressed data, a condition may occur whereby the storage medium has no more available storage space, but it is desired to store other entertainment program items on that storage medium, without entirely deleting some of the previously stored entertainment program items. In such a case, it would be desirable to be able to recover some available storage space by further compressing the data expressing one or more of the previously recorded entertainment program items, e.g., such as to leave at least a minimum amount of the overall contents of such a previously recorded entertainment program item. However in the prior art, there has been no simple and convenient form of apparatus available for achieving such a function.
It is an objective of the present invention to overcome the problems of the prior art set out above, by providing a compressed data processing method and compressed data processing apparatus, to be used in conjunction with a conventional type of MPEG encoder and conventional type of MPEG decoder, whereby MPEG-encoded compressed data conveying a stream of pictures can be operated on in a very simple manner to achieve a reduction of the picture updating frequency of that stream of pictures, thereby enabling special effects such as the aforementioned xe2x80x9ctime lapsexe2x80x9d effect to be readily achieved.
It is a second objective of the present invention to provide a compressed data recording and playback method and a compressed data recording and playback apparatus incorporating a conventional type of MPEG encoder and conventional type of MPEG decoder, whereby an amount of recording space available on a recording medium can be increased through further compression of one or more sets of video data which have previously been recorded on the recording medium in the form of MPEG-encoded compressed data, thereby eliminating the need to entirely delete such previously recorded video data sets.
It is a third objective of the present invention to provide a compressed data processing apparatus for processing a selected part of a stream of MPEG-encoded compressed data to convert the part to a condition whereby a final display picture which is generated from a decoded video signal derived from the selected part will undergo successive displacement in a specified direction.
To achieve the first objective, the invention provides a method of reducing the picture updating frequency of a stream of picture data sets expressing respective compression-encoded pictures, where the term xe2x80x9cpicture updating frequencyxe2x80x9d of a stream of compression-encoded pictures is used in the description and claims of this invention with the meaning of xe2x80x9cfrequency of occurrence of sets of data expressing respectively different picturesxe2x80x9d within that stream. More specifically, the invention is applicable to a compression-encoded picture stream which includes picture data sets each containing prediction information expressing a compression-encoded picture as being predictively encoded with respect to a predetermined corresponding other one of the compression-encoded pictures as a reference picture. The method basically consists of preparing and storing beforehand a copy data set, which is a set of data whose contents indicate a compression-encoded picture as being identical to the corresponding reference picture, and processing the stream of picture data sets to insert the copy data set to replace the prediction information in each of periodically occurring ones of the predictively encoded compression-encoded pictures.
The method is designed for application to an MPEG compressed video data stream, i.e., in which each of the reference pictures is an MPEG I-picture or P-picture, and each of the predictively encoded pictures is a P-picture or a B-picture. The method can be implemented such that each of the pictures for which prediction information is replaced is a B-picture, and the copy data set includes motion vector information indicating that an overall amount of picture motion of a B-picture with respect to a corresponding temporally preceding or succeeding reference picture is zero, and motion compensated prediction error information indicating that respective amounts of motion compensated prediction error for all macroblocks of the B-picture are zero. That is to say, all macroblocks of the B-picture are indicated as being skipped macroblocks, so that at the time of decoding the MPEG data stream, that B-picture will be decoded as an identical copy of a corresponding reference picture.
Alternatively, the method can be implemented such that the above processing is applied both to the B-pictures and also to each of the P-pictures of the MPEG data stream, or it can be arranged that a user can selectively specify copy data replacement to be executed either for the B-pictures alone or for both the B-pictures and the P-pictures.
The amount of data required to indicate that all blocks of an MPEG encoded picture are skipped macroblocks is very small. Hence, a very substantial reduction in MPEG code amount can be easily achieved. This fact can be used for example to apply further compression to items such as video clips etc., which are recorded as MPEG-encoded compressed data on a recording medium, to avoid the need to completely erase such items when it becomes essential to increase the amount of space available on the recording medium for recording other items. By reading out such a previously recorded item and applying the method described above, the item can be re-recorded in a further compressed condition, thereby providing the desired increase in recording space.
Alternatively stated, the method enables the aforementioned xe2x80x9ctime lapsexe2x80x9d slow-motion effect to be achieved in a very simple manner, since for example it enables all of the B- and P-pictures of each MPEG GOP to be converted to a form whereby each of these will be decoded as a picture that is identical to the I-picture of that GOP, at the time of decoding, or whereby each of the B-pictures will be decoded as a copy of a preceding or succeeding I or P reference picture. Thus, the first objective set out above can be achieved. Furthermore if respective streams of MPEG compressed video data expressing items such as films or video clips have been recorded on a recording medium, and it is required to increase the amount of space available on the recording medium for recording other items, the invention enables the MPEG data of a previously recorded item to be read out, to be processed as described above (i.e., to replace the prediction data of all of the B-pictures with copy data, or replace all of the B-pictures and all of the P-pictures with copy data), and then re-recorded on the recording medium, the desired increase in available space can be achieved without the need to completely erase the previously recorded items. Hence, the second objective set out above can be achieved.
The third objective set out above can be very easily achieved, by a modification of the above compressed data processing method. That is to say, within each of one or more GOPs in a selected part of an MPEG data stream, processing is applied to modify the prediction information of each of the predictively encoded pictures within that GOP such as to specify a fixed size and direction of motion vector with respect to a corresponding reference picture, and to specify all-zero values of motion compensated prediction error for each of the macroblocks of these predictively encoded pictures. As a result, a final display picture which is derived by decoding such a processed GOP will be successively displaced across the display screen, in a direction and at a speed which are determined by the magnitude and direction of the fixed motion vector.
By processing a succession of GOPs in that way, and suitably modifying the intervening I-pictures in such a succession, any arbitrary amount, direction and speed of displacement of a finally displayed picture can be achieved, in a very simple manner.
A compressed data processing apparatus according to the present invention for reduction of the picture updating frequency of an MPEG data stream can be configured as a combination of:
a stream buffer memory for temporarily holding and successive portions of the MPEG compressed video data stream,
a copy data memory, such as a ROM, having stored therein a B-picture copy data set containing motion vector information indicating that an overall amount of motion of an MPEG B-picture with respect to a corresponding preceding reference picture or with respect to a corresponding succeeding reference picture is zero, and information indicating that respective amounts of motion compensated prediction error for all macroblocks of the B-picture are zero,
a picture data detection section, for detecting each occurrence of the condition in which a set of data expressing a B-picture of the compressed video data stream is currently held in the stream buffer memory means, and
a data changeover section which functions, when it is detected that a B-picture data is currently present in the stream buffer memory means, to replace all motion vector information and motion compensated prediction error information of the B-picture data set with the B-picture copy data set.
Alternatively, such an apparatus can be configured with both P-picture and B-picture copy data sets being stored, with the apparatus being selectively controllable for operation in a mode in which only all of the B-picture prediction information is replaced by the copy data and a mode in which both all of the P-picture prediction information and also all of the B-picture prediction information are replaced by copy data.
An apparatus for achieving the third objective set out above can be basically similarly configured, but with the copy data specifying a fixed non-zero size and direction for a motion vector of a predictively encoded picture, and zero amounts of motion compensated prediction error for each of the macroblocks of the picture, and with the apparatus also including means for operating on successive I-pictures such as to produce appropriate amounts of successive displacement of these pictures (with respect to a display screen) when the MPEG data are decoded and displayed.
The above points will be made more clear with reference to the following description of preferred embodiments of the invention.