1. Field of the Invention
The present invention concerns a data coding method and an apparatus therefor which enable, for example, coded audio and video data to be stored respectively into units, as a row of packs that is to be reproduced within a prescribed period of time. Especially, the invention concerns a data coding method and an apparatus therefor which make it possible to realize, while keeping the quality of a picture image stably, encoding, before encoding of audio and video data, that permits depiction of navigation data such as data length and starting address that are calculated from a value corresponding to the amount of codes of the coded data.
2. Related Art
In recent years, a data compaction system for use on a moving picture image has hitherto been internationally standardized as an MPEG (Moving Picture Experts Group) system. This MPEG system is known as a system for performing variable compaction of video data. In the MPEG system, there are defined compaction systems which are called “MPEG 1” (MPEG phase 1) and “MPEG 2” (MPEG phase 2).
Concretely, the MPEG is prepared by several techniques being combined with one another. First, a time redundancy portion is reduced by subtracting a picture image signal that has been obtained by being decoded by a motion compensation unit from an input picture image signal.
As the method of prediction, there are three modes, as fundamental modes, i.e., a mode in which prediction is performed from past picture images, a mode in which prediction is performed from future picture images, and a mode in which prediction is performed from both past picture images and future picture images. Also, each of these modes can be used by being switched in units of a macroblock (MB) composed of 16 pixels×16 pixels. The method of prediction is determined according to the picture type (“Picture_Type”) that has been imparted to an input picture image. As the picture types, there are a one-directional between-picture prediction encoded picture image (P-picture), bi-directional between-picture prediction encoded picture image (B-picture), and intra-picture independently encoded picture image (I-picture). In the P-picture type (one-directional between-picture prediction encoded picture image), there are two modes one of which is to encode by performing prediction from past picture images and the other of which is to independently encode a macroblock without performing relevant prediction. In the B-picture (bi-directional between-picture prediction encoded picture image), there are four modes, a first one of which is to perform prediction from future picture images, a second one of which is to perform prediction from past picture images, a third one of which is to perform prediction from both past picture images and future picture images, and a fourth one of which is to encode independently without performing any prediction. In the I-picture (intra-picture independently encoded picture image), all macroblocks are each independently encoded. It is to be noted that the I-picture is called “an intra-picture” and that, therefore, the one-directional between-picture prediction encoded picture image and the bi-directional between-picture prediction encoded picture image can each be referred to as “a non-intra-picture”.
In the motion compensation, by performing pattern matching of the movement regions in units of a macroblock, a motion vector is detected with a half pixel precision and prediction is made after shifting of the macroblock to an extent corresponding to the thus-detected motion vector. The motion vector includes horizontal and vertical motion vectors, and this motion vector is transmitted as additional messages for macroblock along with an MC (Motion Compensation) mode that indicates where prediction is made from.
The pictures from the I-picture to a picture that immediately precedes the next I-picture are called “GOP (Group Of Picture)”. In a case where pictures are used in accumulation media or the like, approximately 15 pictures or so are generally used as 1 GOP.
FIG. 1 illustrates a fundamental construction of a video encoder that is among audio/video encoding apparatuses, to which the MPEG is applied.
In this FIG. 1, an input picture image signal is supplied to an input terminal 101. This input picture image signal is sent to a calculating unit 102 and to a motion compensation and prediction unit 111.
In the calculating unit 102, a difference between a picture image signal, which has been decoded in the motion compensation and prediction unit 111, and the input picture image signal is determined, and a picture image signal corresponding to this difference is sent to a DCT unit 103.
In the DCT unit 103, the differential picture image signal that has been supplied is subjected to orthogonal transformation. Here, the DCT (Discrete Cosine Transform) means an orthogonal transformation through which an integrating transformation that uses a cosine function as an integrating kernel is changed to a discrete transformation that is made into a finite space. In the MPEG system, two-dimensional DCT is performed of 8×8 DCT blocks that have been obtained by dividing the macroblock into four parts. It is to be noted that in general a video signal is composed of a large amount of low frequency band components and a lesser amount of high frequency band components and that, therefore, when performing DCT, the coefficients thereof are concentratedly gathered into the low band. Data that has been obtained by performance of the DCT in the DCT unit 103 is sent to a quantizing unit 104.
In the quantizing unit 104, quantization is performed of the DCT coefficients from the DCT unit 103. In the quantization performed in this quantizing unit 104, a two-dimensional frequency of 8×8, which constitutes a quantizing matrix is weighted by visual characteristics. The value that has been resultantly obtained is further made scalar-fold by a quantizing scale. And using the thus-obtained value as a quantizing value, the DCT coefficient is divided by this value. It is to be noted that when performing inverse quantization, by a decoder (video decoder), of coded data after the encoding performed by this video encoder, multiplication of it is made by the quantizing value that was used in the video encoder. As a result of this, it is possible to obtain a value that is approximate to the original DCT coefficient. Data that has been obtained by the quantization made in the quantizing unit 104 is sent to a variable length coder (VLC) 105.
The VLC 105 performs variable length coding on the quantized data from the quantizing unit 104. In this VLC 105, of the quantized values, with respect to direct current (DC) components, coding is performed using DPCM (differential pulse code modulation) that is one of the prediction coding techniques. On the other hand, with respect to alternating current (AC) components, so-called “Huffman coding” is performed in which so-called “zigzag scan” is performed from a low band to a high band and, by counting the run length and effective coefficient value of a zero as being one piece of significant event, a code having a shorter code length is allotted to the data sequentially from one, the probability of whose occurrence is higher. Also, to the VLC 105 there are also supplied from the motion compensation and prediction unit 111 motion vector and prediction mode messages, whereby the VLC 105 outputs these motion vector and prediction mode messages as well as the variable coded data as additional data with respect to the macroblock. Data that has been obtained by the variable length coding performed in the VLC 105 is sent to a buffer memory 106.
The buffer memory 106 temporarily stores therein the variable length coded data from the VLC 105. Thereafter, the coded data (the coded bit stream) that has been read out from the buffer memory 106 at a prescribed transfer rate is output from an output terminal 113.
Also, the amount of codes generated in macroblock units that regards the thus-outputted coded data is transmitted to an amount-of-code controlling unit 112 as later described. The amount-of-code controlling unit 112 determines an error amount of code that is the difference between the amount-of-code generated and a target amount of code in macroblock units, and produces an amount-of-code control signal that corresponds to the error amount-of-code and thereby feeds it back to the quantizing unit 104, thereby performing control of the amount-of-code generated. The amount-of-code control signal that is fed back to the quantizing unit 104 in order to perform the amount-of-code control is a signal for controlling the quantizing scale in the quantizing unit 104.
On the other hand, picture image data that has been quantized in the quantizing unit 104 is also sent to an inversely quantizing unit 107.
The inversely quantizing unit 107 performs inverse quantization of the quantized data that has been sent from the quantizing unit 104. DCT coefficient data that has been obtained through the operation of this quantization is sent to an inverse DCT unit 108.
The inverse DCT unit 108 performs inverse. DCT of the DCT coefficient data from the inversely quantizing unit 107 and thereafter sends the resulting data to a calculating unit 109.
The calculating unit 109 adds a predicted differential picture image from the motion compensation and prediction unit 111 to the output signal of the inverse DCT unit 108. As a result of this, a picture image signal is restored.
The thus-restored picture image signal is temporarily stored in a picture image memory 110 and thereafter is read out and sent to the motion compensation and prediction unit 111.
The picture image signal that has been sent from the picture image memory 110 to the motion compensation and prediction-unit 111 is used for the purpose of producing a decoded picture image that serves as a reference for calculating a differential picture image in the calculating unit 102.
The motion compensation and prediction unit 111 detects a movement vector from the input picture image signal and shifts the picture image by the extent corresponding to a size of the thus-detected movement vector and thereafter performs prediction. A predicted differential picture image signal that has been obtained by this prediction is sent to the calculating units 102 and 109. Also, the movement vector that has been detected by the motion compensation and prediction unit 111 is sent to the VLC 105 together with the prediction mode (MC mode) message.
It is to be noted that the picture image on which coding of the differential picture image signal is performed as mentioned above is one of the P-picture and B-picture types, and that, in the case of a picture image of the I-picture type, the input picture image signal is coded as is.
FIG. 2 illustrates a fundamental construction of a video decoder that decodes coded data that has been coded by the video encoder illustrated in FIG. 1.
In this FIG. 2, a coded data is supplied to an input terminal 121. The coded data is sent to a variable length decoding unit 122 (VLD). The VLD 122 performs variable length decoding on the data that is inverse processing of the variable length coding in the VLC 105 of FIG. 1. Data that is obtained by the variable length decoding corresponds to one that has been prepared by addition of the movement vector message and the prediction mode message to the quantized data, which is an input to the VLC 105 of FIG. 1. The quantized data that has been obtained by the variable length decoding performed in the VLD 122 is sent to an inversely quantizing unit 123.
The inversely quantizing unit 123 performs inverse quantization of the quantized data from the VLD 122. Data that has been obtained by this inverse quantization corresponds to the DCT coefficient data that is an input to the quantizing unit 104 of FIG. 1. DCT coefficient data that has been obtained by inverse quantization performed in the inversely quantizing unit 123 is sent to an inverse DCT 124. Also, the movement vector and prediction mode messages are sent from the inversely quantizing unit 123 to a motion compensation and prediction unit 127.
The inverse DCT 124 performs inverse DCT of the DCT coefficient from the inversely quantizing unit 123. Data that has been obtained by the inverse DCT performed in the inverse DCT unit 124 corresponds to the differential picture image signal that is an input to the DCT unit 103 of FIG. 1. The differential picture image signal that has been obtained by the inverse DCT performed in the inverse DCT unit 124 is sent to a calculating unit 125.
The calculating unit 125 adds the predicted difference picture image from the motion compensation and prediction unit 127 to the differential picture image signal from the inverse DCT unit 124. As a result of this, decoded data, i.e., picture image signal is restored. The thus-restored picture image signal substantially corresponds to the input picture image signal to the input terminal 101 of FIG. 1. This restored picture image signal (decoded data) is output from an output terminal 128 and simultaneously is temporarily stored in a picture image memory 126 and thereafter is sent to the motion compensation and prediction unit 127.
The motion compensation and prediction unit 127 produces, on the basis of the movement vector and prediction mode, a predicted differential picture image from the picture image signal that has been supplied from the picture image memory 126, and sends this predicted differential picture image to the calculating unit 125.
Although in the MPEG2 it is defined as previously mentioned that setting is made of the transfer-starting time and reproduction time each of which represents video data and audio data with the use of a reference time so as to enable transfer and reproduction with the both data being in synchronism with each other, it is pointed out that although with mere use of only such transfer-starting time and reproduction time messages no problem would exist with normal reproduction, it is difficult to perform specific reproductions such as fast-forwarding reproduction, reverse-winding reproduction, random reproduction, etc. or reproduction processing such as one causing interactiveness to be had in the system.
Under the above-described circumstances, as disclosed in Japanese Patent Application Laid-Open No. 8-273304, there exists an application that is arranged to store audio and video data that has been encoded using the MPEG into an video object unit as a row of packs that is to be reproduced within a prescribed period of time and further to record reproduction message for making a reproduction of this row of packs and search message for making a search, at a foremost position of the packs row as navigation data.
Since the video object unit and navigation data are already disclosed and described in detail in Japanese Patent Application Laid-Open No. 8-27330, a detailed description thereof is omitted but the video object units 85 constitute a cell 84 by being plurally grouped as shown in FIG. 3. Also, the cells 84 constitute a video object 83 by being plurally grouped. Further, these video objects 83 constitute a video object set 82 by being plurally grouped.
The video object unit 85 is defined as a packs row having one piece of navigation pack 86 at a foremost position. Also, within the video object unit 85 there are disposed video packs 88, sub-video packs 90 and audio packs 91 that are determined in the MPEG standard. Also, the video object units 85 have allotted thereto numbers in the sequential order of reproduction, respectively, and the reproduction period of time for reproducing the video object unit 85 corresponds to the reproduction period of time for reproducing video data that is composed of a singular or plural piece of GOPs that are included in the video object unit 85.
In the navigation pack 86 there are disposed as navigation data a reproduction control message for reproducing the video object unit 85, search message for making a search, etc. The reproduction control message is navigation data for making presentation in synchronism with the state of reproduction of video data within the video object unit 85, namely for making an alteration of the contents displayed. Namely, the reproduction control message is a message for determining the reproduction conditions in accordance with the state of presentation data, namely real-time control data that has been dispersed and disposed on a data stream. Also, the search message is navigation data for executing search of the video object unit 85. Namely, the search message is a message for performing seamless reproduction as well as forward fast winding/reverse fast winding reproduction, namely real-time control data that has been dispersed and disposed on a data stream.
Especially, in the search message for making a search of the video object unit 85, there is depicted a message for particularizing the foremost address within the cell 84. Namely, in the search message for searching for the video object unit 85, numbering the video object unit 85 including the search message as being the 0-th unit serving as a reference, the numbers (start addresses) of the video object units 85 of, from the 1st (+1) to the 20th (+20), the 60th (+60), the 120th (+120), and the 240th (+240) in this order are depicted as addresses (forward addresses) for performing forward reproduction of the units in the sequential order of reproduction thereof. Similarly, in the search message for searching for the video object unit 85, numbering the video object unit 85 including the search message as being the 0th unit serving as a reference, the start addresses of the video object units 85 of, from the 1st (−1) to the 20th (−20), the 60th (−60), the 120th (−120), and the 240th (−240) in this order are depicted as addresses (backward addresses) for performing reverse reproduction of the units in a direction reverse to that in the sequential order of reproduction thereof.
By the way, in order to depict into the navigation pack, before starting the MPEG coding, navigation data that contains the above-described reproduction message for making a reproduction of the video object unit and the search message for making a search therefor, it is necessary to use a memory having a large storage capacity. In addition, navigation data must be produced by, after this coding has been finished, observing and measuring the coded results (the amount of codes) and thereby calculating prescribed reproduction data.
Also, as described in Japanese Patent Application Laid-Open No. 8-273304, in a case where, by numbering the above-mentioned video object unit to be the 0th unit in the sequential order of reproduction, it is attempted, using this video object unit as a reference, to depict the addresses of the video object units that are reproduced up to at least the forward and backward 15th units as counted in the sequential order of reproduction, and to depict the addresses of the 20th, 30th, 60th, 120th, and 240th video object units as counted in the sequential order of reproduction, since the coded data of the MPEG video data is basically one that has been prepared by variable length coding, it is impossible to calculate the addresses of the video object units unless all the video coded data has been already prepared like so-called coding by 2 paths. Accordingly, it is impossible to perform real-time coding and recording of navigation data.