1. Field of the Invention
The present invention relates to an MPEG picture data recording apparatus, an MPEG picture data recording method, an MPEG picture data recording medium, an MPEG picture data generating apparatus, an MPEG picture data reproducing apparatus, and an MPEG picture data reproducing method for realizing a seamless connection of a first MPEG picture data and a second MPEG picture data that are image data encoded by the MPEG encoding system, at the time of connecting the first MPEG picture data to the second MPEG picture data at a connection point specified in the respective MPEG picture data and for reproducing the connected MPEG picture data.
2. Description of the Related Art
The MPEG as a conventional technique to be used in the present invention will be briefly explained below.
As the MPEG has been explained in detail in the ISO-IEC11172-2, the ITU-T H.262/ISO-IEC13818-2, only the outline will be explained below. The MPEG is an abbreviation of the Moving Picture Experts Group that is the name of an organization for studying the moving picture encoding standard established in the ISO/IEC JTC1/SC2 (International Standard Organization/International Electrotechnical Commission Joint Technical Committee 1/Specialized Committee 2, the current SC29), in 1988. The MPEG1 (MPEG phase 1) is a standard of an accumulation media of about 1.5 Mbps, into which a new technique has been introduced by taking over the JPEG that aims at the encoding of stationary images, and the basic technique of H.261 (the CCITT SGXV, standardized in the current ITU-T SG15) that aims at the compression of moving pictures for a low transfer rate of the ISDN television meetings and television telephones. The MPEG1 has been established as the ISO/IEC 11172 in August 1993.
The MPEG1 is prepared by combining several techniques. FIG. 1 shows a conventional MPEG encoder for carrying out the encoding according to the MPEG encoding system. This will be briefly explained below.
A differential unit 2 is input with a straight input image and an image prepared by decoding this input image by a motion compensation predicting unit 1. The differential unit 2 subtracts the image decoded by the motion compensation predicting unit 1 from the straight input image, thereby to delete a time redundancy portion.
As the method of prediction, there are three modes, as fundamental modes, i.e., a mode in which prediction is performed from past picture images, a mode in which prediction is performed from future picture images, and a mode in which prediction is performed from both past picture images and future picture images. Also, each of these modes can be used by being switched in units of a macroblock (MB) composed of 16 pixels×16 pixels. The direction of prediction is determined according to the picture type that has been imparted to an input picture image. As the picture types, there are a one-directional between-picture prediction encoded picture image (P-picture), bi-directional between-picture prediction encoded picture image (B-picture), and intra-picture independently encoded picture image (I-picture). In the P-picture type (one-directional between-picture prediction encoded picture image), there are two modes one of which is to encode by performing prediction from past picture images and the other of which is to independently encode a macroblock without performing relevant prediction. In the B-picture (bi-directional between-picture prediction encoded picture image), there are four modes, a first one of which is to perform prediction from future picture images, a second one of which is to perform prediction from past picture images, a third one of which is to perform prediction from both past picture images and future picture images, and a fourth one of which is to encode independently without performing any prediction. In the I-picture (intra-picture independently encoded picture image), all macroblocks are each independently encoded.
In the motion compensation, by performing pattern matching of the movement regions in units of a macroblock, a motion vector is detected with a half pixel precision, and prediction is made after shifting of the macroblock to an extent corresponding to the thus-detected motion vector. The motion vector includes horizontal and vertical motion vectors, and this motion vector is transmitted as additional messages for macroblock along with an MC (Motion Compensation) mode that indicates where prediction is made from.
In general, the pictures from the I-picture to a picture that immediately precedes the next I-picture are called “GOP (group Of Pictures)”. In a case where pictures are used in accumulation media or the like, approximately 15 pictures or so are generally used as 1 GOP. (However, two or more I pictures may be included within one GOP section. In short, one or more I picture may be included within one GOP section.)
In a DCT unit 3, a differential picture image signal that has been supplied is subjected to orthogonal transformation. Here, the DCT (Discrete Cosine Transform) means an orthogonal transformation through which an integrating transformation that uses a cosine function as an integrating kernel is changed to a discrete transformation that is made into a finite space. In the MPEG, two-dimensional DCT is preformed of 8×8 DCT blocks that have been obtained by dividing the macroblock into four parts. It is to be noted that in general a video signal is composed of a large amount of low frequency band components and a lesser amount of high frequency band components and that, therefore, when performing DCT, the coefficients thereof are concentratedly gathered into the low band.
In a quantizing unit 4, quantization is performed of the discrete-cosine transformed picture data (DCT coefficients). In the quantization performed in this quantizing unit 4, a two-dimensional frequency of 8×8, which constitutes a quantizing matrix is weighted by visual characteristics. The value that has been resultantly obtained is further made scalar-fold by a quantizing scale. And using the thus-obtained value as a quantizing value, the DCT coefficient is divided by this value. When the quantization is performed by an MPEG decoder (decoder), encoded data is multiplied with the quantizing value. As a result, it is possible to obtain a value that is approximate to the original DCT coefficient.
A VLC 5 performs variable length coding on the quantized data. In this VLC 5, of the quantized values, with respect to direct current (DC) components, coding is performed using DPCM (differential pulse code modulation) that is one of the prediction coding techniques. On the other hand, with respect to alternating current (AC) components, so-called “Huffman coding” is performed in which so-called “zigzag scan” is performed from a low band to a high band and, by counting the run length and effective coefficient value of a zero as being one piece of significant event, a code having a shorter code length is allotted to the data sequentially from one, the probability of which occurrence is higher.
A buffer memory 6 temporarily stores therein the variable length coded data, and produces an output as encoded data at a predetermined transfer rate. The amount of codes generated in macroblock units is transmitted to an amount-of-code controlling unit 21. The amount-of-code controlling unit 21 determines an error amount of code that is the difference between the amount-of-code generated and a target amount of code in macroblock units, and produces an amount-of-code control signal that corresponds to the error amount-of-code and thereby feeds it back to the quantizing unit 4, thereby performing control of the amount-of-code generated by adjusting the quantizing scale.
The quantized picture data is inversely quantized by an inverse quantizing unit 7, and is then inversely discrete-cosine transformed by an inverse DCT unit 8. The data is then temporarily stored in a picture memory 10 via an adder 9. After that, the data is used in the motion compensation predicting unit 1 as a reference decoding picture for calculating a differential picture.
FIG. 2 shows an MPEG decoder (decoder) for decoding the MPEG encoded data.
An input encoded data (stream) is buffered by a buffer 11. A data from the buffer 11 is input to a VLD 12. The VLD 12 performs a variable length decoding, and obtains a DC component and an AC component. The AC component data is disposed in matrix of 8×8 in the sequence of zigzag scan from a low area to a high area. The data is input to an inverse quantizing unit 13, and is inversely quantized there in a quantization matrix. The inversely quantized data is input to an inverse DCT 14, and is inversely discrete-cosine transformed there. The result is output as a picture data (decoded data). The decoded data is temporarily stored in a picture memory 16. After that, a motion compensation predicting unit 17 uses this data as reference decoding picture for calculating a differential picture.
The encoded bit stream has an amount of code in a variable length for each one picture in the case of a video. This is because the MPEG uses information conversion like DCT, quantization, and Huffman coding, and it is necessary to suitably change the amount of code to be allocated to each picture for improving the picture quality. Further, as the motion compensation prediction is carried out, it is necessary to encode the input picture as it is, and to encode the differential picture as a differential of prediction pictures in some case. Thus, the entropy of the encoded picture itself changes to a large extent.
In this case, the amount of code is controlled by distributing it based on the entropy rate of the picture while limiting the buffer in most cases. A buffer managing unit monitors the relationship between the generated amount of code and encoding rate, and sets a target amount of code such that it is accommodated within a predetermined buffer. This value is fed back to the variable length encoder, and is input to the amount-of-code controlling unit. The amount-of-code controlling unit restricts the generated amount of code by increasing the quantized value to be set to the quantizing unit and makes small the generated amount of code by decreasing the quantized value.
In the case of encoding the variable length data at a fixed transfer rate (encoding rate), it has been prescribed in the MPEG to use a model in which when a predetermined value of data has been accumulated after the input of the data at a constant rate, the decoding of the data is executed momentarily at a predetermined time (in the unit of 1/29.97 in the case of the video signal of the NTSC), and the encoding is executed such that the buffer does not generate either an overflow or an underflow. So long as this prescription (the VBV buffer prescription) is kept, the data is transferred at a fixed transfer rate when the observation time is taken long, although the data transfer rate within the VBV buffer changes locally. In the MPEG, this is defined as a fixed rate.
In the case of the fixed transfer rate, the buffer-occupied amount is fixed to a maximum buffer amount of the decoder as an upper limit value, when the generated amount of code is small. In this case, it is necessary to increase the amount of code by adding invalid bits so as not to cause an overflow.
When the data is transferred at a variable transfer rate, the definition of the fixed transfer rate is expanded. It has been defined that when the buffer occupation rate has reached the upper limit value, the reading of the decoder is stopped, thereby to avoid the generation of an overflow in principle. FIG. 3 shows a transition of the buffer. Even when the generated amount of code is very small, the reading of the decoder is stopped. Therefore, it is not necessary to add the invalid bits like in the case of the fixed transfer rate. Accordingly, encoding is performed so as not to cause only an underflow.
The MPEG prescribes a system in which bit streams encoded by the MPEG video or audio are multiplexed into one bit stream, and the data is reproduced while securing the synchronization. The contents prescribed in the system are broadly classified into the following five points.
1) Synchronous reproduction of a plurality of encoded bit streams
2) Multiplexing of a plurality of encoded bit streams into a single bit steam
3) Initialization of a buffer at the time of starting a reproduction
4) Management of continuous buffers
5) Firming up of a time for decoding or reproduction
In order to execute a multiplexing in the MPEG, it is necessary to packet the information. The multiplexing based on packets is a system in which, when a video or audio data is to be multiplexed, for example, each data is divided into streams called packets each having a suitable length, a header including additional information is added to each packet, and the packets are transmitted in time division by suitably switching the packets of the video or audio data. The header includes information for identifying video or audio data, and time information for synchronization. The packet length depends on a transmission medium or application, and has a length from 53 bytes for the ATM to long 4 k bytes for an optical disk. In the MPEG, it is possible to optionally assign a variable packet length.
The data is packed and is divided into packets, and one pack is composed of a few packets. The header portion of each pack is described with a pack start code and SCR (System Clock Reference), and the header portion of the packet is described with a stream id and a time stamp. The time stamp is described with time information for synchronizing audio or video data, and has two kinds of DTS (Decoding Time Stamp) and PTS (Presentation Time Stamp). PCR (Program Clock Reference) is described in the time precision of 27 MHz, and this is the information for locking a reference clock of the decoder. The DTS shows a decode starting time of the first access unit within the packet data (one picture in the case of the video, and 1152 samples, for example, in the case of the audio). The PTS shows a display (reproduction) starting time of this. As shown in FIG. 4, an audio decoder, a video decoder, and other decoders always monitor a common reference clock locked in the PCR, and execute decoding or display when the time coincides with the time of the DTS or the PTS. A multiplexed data is buffered by each decoder. A virtual decoder for executing a synchronous display is called an STD (System Target Decoder). The multiplication must be performed so as not to allow the STD to cause an overflow or an underflow.
In the MPEG, there exist broadly two types of TS (Transport Stream) and PS (Program Stream). These streams are composed of PES (Packetized Elementary Stream) and packets including other necessary information. The PES is prescribed as an intermediate stream for making it possible to transform between both streams. The PES is a packetized stream of a private stream in addition to the video and audio data encoded by the MPEG.
The PS can multiplex video and audio data of a program having a common reference time. A packet layer is called PES, and this structure is commonly used with TS to be described later as shown in FIG. 5, and makes it possible to achieve mutual compatibility. In the STD model of the PS, the stream is switched by the stream id within the PES packet.
The TS can also multiplex video and audio data of a program having a common reference time, like the PS. The TS can further make it possible to multiplex a multiprogram for communications and broadcasting having different reference times. The TS is constructed of fixed length packets of 188 bytes by taking into account an ATM cell length and error correction encoding. The TS can be used in a system having an error. The structure of the TS packet itself is not so complex. However, because of the multiprogram steam, the application of the TS packet is complex. The TS packet is different from the PS packet in that while the TS packet is in the high-level structure, the TS packet is (usually) shorter than the PES packet, and the PES packet is divided and transferred by mounting it on the TS packet. In the STD model of the TS, the stream is switched based on PID (packet ID) within the TS packet.
The TS of the MPEG is designed to indicate a PID to which a packet relating to the information on a multiplexed program belongs. This will be explained with reference to FIG. 6. First, a packet of PID=0 is searched for in the TS packets. This is an information packet called PAT (Program Association Table). Within this packet, there is described information PID corresponding to the program number PR in a linked format. Next, a PID packet corresponding to a target PR is searched for. Then, there is an information packet called PMT (Program Map Table). In this packet, there are described information of the PID of the video packet and the PID of the audio packet of the program corresponding to this PR.
The PAT and the PMT are called PSI (Program Specific Information). There is provided an information system which makes it possible to access (enter) a channel of a target program.
According to the invention disclosed in Japanese Patent Application Laid-open Publication No. 11-74799, there is disclosed a method of carrying out an encoding by taking into account the continuity. According to this conventional method, in the case of editing compressed data like MPEG picture data recorded on a recording medium, in order to keep the continuity of the MPEG picture data, the generated amount of code is controlled such that the VBV buffer becomes always constant at the editing time, and the GOP is encoded as a closed GOP.
Further, according to the invention disclosed in Japanese Patent Application Laid-open Publication No. 11-187354, there is disclosed a method that no constraint is provided in an encoded data, and information showing data extracted as an editing element and information relating to a sequence of reproducing this are described in a partial section of this data, thereby to realize an editing of pictures on a single recording medium without changing the recorded data.
However, according to the above conventional systems, a simple connection of the MPEG picture data generates a contradiction in the connection of the VBV buffers, resulting in an overflow or an underflow of the data. In the case of the fixed length encoding, the value of the VBV is described for each picture, and it has been possible to calculate a starting value of the VBV at a starting point of an additional recording, by observing the picture bit streams. However, it has been necessary to decode a part of the data into MPEG compressed data. Further, in the case of the variable length encoding, the value of the VBV is not described in the syntax at all. Therefore, it has been necessary to calculate the starting value by observing the generated amount of code of each picture from the header of the compressed data. This has required more circuits and more calculation time.
According to the invention disclosed in Japanese Patent Application Laid-open Publication No. 11-74799, the encoding constraint is provided by taking into account the continuity, such as the generated amount of code is controlled such that the VBV buffer becomes always constant for each GOP, and the GOP is encoded as a closed GOP, in order to keep the continuity of the MPEG picture data. This has been disadvantageous in the aspect of encoding efficiency.
Further, according to the invention disclosed in Japanese Patent Application Laid-open Publication No. 11-187354, the reproduction is displayed as if the editing has been performed. However, the continuity at the editing point is not complete. As a result, there has been a possibility that a temporary stationary phenomenon occurs due to the initialization of the data buffer of the MPEG picture data.