Recently, the multi-media era has come in which sound, pictures and other pixel values are integrated into one media, and conventional information media as communication tools like newspapers, magazines, TV, radio and telephone are regarded as the targets of multi-media. Generally, multi-media is a form of simultaneous representation of not only characters but also graphics, sound, and especially pictures. In order to handle the above-described conventional information media as multi-media, it is a requisite to represent the information digitally.
However, it is unrealistic to directly process a huge amount of information digitally using the above-described conventional information media because, when calculating the data amount of each information medium described above as digital data amount, data amount per character is 1 to 2 bytes while that of sound per second is not less than 64 Kbits (telephone speech quality) and that of moving pictures per second is not less than 100 Mbits (present TV receiving quality). For example, a TV telephone has already become commercially practical thanks to Integrated Services Digital Network (ISDN) with a transmission speed of 64 kbps to 1.5 Mbps, but it is impossible to transmit moving pictures of TV camera as they are using ISDN.
That is why information compression technique is necessary. For example, a moving picture compression technique standard of H. 261 or H. 263 which is recommended by the International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) is used for TV telephones. Also, with the information compression technique of the MPEG-1 standard, it becomes possible to store image information, together with sound information, in a normal CD (Compact disc) for music.
Here, Moving Picture Experts Group (MPEG) is an international standard to digitally compress moving picture signals, and has been standardized by the ISO/IEC (the International Standardization Organization/International Engineering Consortium). MPEG-1 is the standard to compress moving picture signals down to 1.5 Mbps, that is, to compress TV signal information to about one hundredth. Also, the quality which satisfies the MPEG-1 standard is medium level which can be realized at a transmission rate of about 1.5 Mbps. MPEG-2 is thus standardized in order to satisfy the need for higher picture quality, and it compresses moving picture signals to 2 to 15 Mbps. At present, the work group (ISO/IEC JTC/1/SC29/WG11), which standardized MPEG-1 and MPEG-2, has standardized MPEG-4 with a higher compression rate. The MPEG-4 standard (i) achieves a compression rate higher than those of MPEG-1 standard and MPEG-2 standard, (ii) enables coding, decoding and performing operations on an object-by-object basis, and (iii) realizes new functions necessary in this multimedia era. The initial object of MPEG-4 standard is to standardize a coding method of pictures with low bit rates, but the object is extended to a general purpose coding method of interlace pictures with high bit rates. After that, ISO/IEC and ITU-T, in combination, has standardized MPEG-4 AVC (Advanced Video Coding) as a next generation picture coding method of pictures with a high compression rate. This is expected to be used for next generation optical disc related apparatuses or in broadcasting for mobile terminals.
Generally, in coding moving pictures, information amount is compressed by reducing temporal and spatial redundancies. In the inter picture prediction coding aiming to reduce temporal redundancies, motion estimation and prediction picture generation are performed on a block-by-block basis with reference to a forward picture or a backward picture, and coding is performed on the differential value between the obtained prediction picture and the picture to be coded. Here, “picture” used here is a term representing one picture. In a progressive picture, a picture means a frame, but in an interlace picture, it means a frame or a field. An “interlace picture” described here means a frame composed of two fields with a slight time lag. In the coding and decoding processes of interlace pictures, it is possible to process a frame as it is, as two fields, or on a frame-by-frame or on a field-by-field of each block in a frame.
The picture for performing intra prediction coding without referring to any reference picture is called Intra Coded Picture (I picture). Also, the picture for performing inter prediction coding referring to only a picture is called Predictive Coded Picture (P picture). Also, the picture for performing inter prediction coding referring to two reference pictures simultaneously is called Bi-predictive Coded Picture (B picture). A B picture can refer to two pictures selected as an arbitrary combination of a forward picture and a backward picture in display time. Such two reference pictures can be specified on a block-by-block basis, the block being a basic unit of coding and decoding. Those reference pictures are distinguished from each other as follows: the reference picture described earlier in the coded bit stream is called first reference picture, and the other reference picture described later is called second reference picture. Note that such reference pictures must have already been coded or decoded in order to code or decode an I picture, P pictures and B pictures.
Motion compensation intra prediction coding is used for coding of P pictures and B pictures. Motion compensation intra prediction coding is an intra prediction coding method in which motion compensation is applied. Motion compensation is a method for improving prediction precision and reducing data amount by estimating motion amount (called motion vector hereafter) of each block of a picture and by performing prediction coding considering the motion vector. For example, data amount is reduced by estimating motion vectors of pictures to be coded and by coding each prediction residual between each prediction value which is shifted by the amount of each motion vector and each current picture to be coded. In the case of this method, since motion vector information is needed in decoding, motion vectors are also coded, and recorded or transmitted. Motion vectors are estimated on a macro block by macro block basis. To be more specifically, motion vectors are estimated by fixing the macro block of a picture to be coded, moving the macro block of a reference picture within the search range, and finding the location of the reference block which is closest to the standard block.
FIGS. 1A and 1B are structural diagrams of conventional MPEG-2 streams respectively.
As shown in FIG. 1B, an MPEG-2 stream has a hierarchical structure like will be described below. A stream is composed of a Group of Pictures (called GOP hereafter). The use of a GOP as a basic unit in coding processing enables editing a moving picture or performing a random access. A GOP is made up of an I picture, P pictures and B pictures. A stream, a GOP and a picture further includes a synchronous signal (sync) indicating a border of units and a header indicating the data common in the units, the units here being a stream, a GOP and a picture respectively.
FIGS. 2A and 2B respectively show examples indicating how to perform inter picture prediction coding which is used in MPEG-2. The diagonally-shaded pictures in the figure are those pictures to be referred to by other pictures. As shown in FIG. 2A, in prediction coding in MPEG-2, P pictures (P0, P6, P9, P12 and P15) can refer to only a single picture selected as an immediately forward I picture or P picture in display time. Also, B pictures (Bi, B2, B4, B5, B7, B8, B10, B11, B13, B14, B16, B17, B19, and B20) can refer to two pictures selected as a combination of an immediately forward I picture or P picture and an immediately backward I picture or P picture. Further, the order of pictures to be placed in a stream is determined. I pictures and a P picture are placed in the order of display time, and each B picture is placed immediately after an I picture to be displayed immediately after the B picture or immediately after a P picture. As a structural example of a GOP, as shown in FIG. 2B, pictures from I3 to B14 are grouped into a single GOP.
FIGS. 3A to 3B show the decoding order, the display order and delay amounts which occur between decoding time and display time of a GOP structure used in an MPEG-2 stream respectively.
Here, the MPEG-2 stream has a fixed frame rate, and the B pictures are decoded and displayed simultaneously. In an MPEG-2 stream, as shown in FIGS. 3A and 3B, the delay amount which is the time lag from the decoding time of the top picture of the GOP to the display time of the top picture is equivalent to one frame or two fields at maximum. This delay amount will be called frame delay hereafter, and the length of a frame delay will be counted on a frame by frame basis (one frame corresponds to two fields). Optical disc apparatuses such as a Digital Versatile Disk (DVD) employs the MPEG-2 standard in which it is defined that frame delays are fixed at one. Note that delay amounts are changeable at the time of pulling down such as displaying, at 60 Hz, the streams that have been coded at 24 Hz. Since it is possible to determine the delay amounts based on the case of displaying the coded streams according to the frame rate, a case of displaying the coded streams according to the frame rate will be described below.
FIG. 4 is a structural diagram of an MPEG-4 AVC stream. There is no concept equivalent to a GOP in the MPEG-4 AVC. However, since it is possible to construct a randomly-accessible access unit equivalent to a GOP by segmenting data in a unit of a special picture which can be decoded without depending on other pictures, the unit will be called RAU (Random Access Unit) hereafter.
There are two types of I pictures in MPEG-4 AVC. They are an Instantaneous Decoder Refresh (IDR) and the rest. An IDR picture is the I picture which can decode all the pictures placed after the IDR picture in a decoding order, without referring to pictures placed before the IDR picture in the decoding order. An IDR picture corresponds to the top I picture of an MPEG-2 closed GOP. In the case of an I picture which is not an IDR picture, a picture placed after the I picture in the decoding order may refer to a picture placed before the I picture in the decoding order. Also, it is possible to form a structure like an open GOP in the MPEG-2 by placing an I picture that is not an IDR picture at the top of a random access unit RAU and restricting the predictive structure of pictures in the random access unit RAU.
FIG. 5 is an example of a prediction structure of pictures in an MPEG-4 AVC stream.
Since the MPEG-4 AVC allows flexible prediction structures, for example, picture P2 can refer to picture I8. In the example of FIG. 5, since display is started after picture I8 and picture P2 are decoded first, the frame delay becomes two. Since prediction structures are flexible in this way, frame delays are not limited to one at maximum like in the case of MPEG-2. This means that frame delays are variable depending on prediction structures. Therefore, it is impossible to perform playback on condition that frame delays are fixed at one.
Package media such as DVDs have a special playback function such as (i) playback where particular parts of the same stream are selectively played back or where different streams can be played back continuously and (ii) multi-angle playback where playback is performed changing streams with a different angle. The basic unit of using such a function is a GOP in the MPEG-2 and a random access unit RAU in the MPEG-4 AVC.
FIG. 6A to 6C show an example of changing streams to be played back in the MPEG-2. FIG. 6A to 6C respectively show the GOPs included in Stream 1, 2 and 3. Here, streams to be played back are changed from Stream 1 to Stream 2 by decoding GOP 2-1 next to GOP 1-1. This makes it possible to perform playback in a fixed rate without allowing the occurrence of a gap at the time of display because frame delay amounts are one both in the GOP1-1 and GOP2-1. Likewise, it is possible to change from Stream 1 to Stream 3 by decoding GOP3-1 next to GOP1-1.
Conventionally, various techniques relating to moving picture coding, multiplexing, decoding and demultiplexing like those described above have been proposed. (For example, refer to Japanese Laid-Open Patent Application No. 2003-18549 publication.) FIG. 7 is a flow chart showing the operation of a conventional multiplexing apparatus for coding and multiplexing moving picture data.
First, in Step 101 and Step 102, the multiplexing apparatus codes one or more streams. Next, in Step 103, it generates management information and then goes to Step 104. Management information includes the information for accessing the stream generated in Step 101, the information indicating data to be played back at the time of special playback such as multi-angle playback or the like. After that, in Step 104, it multiplexes the management information with stream data and outputs the multiplexed data.
FIG. 8 is a block diagram showing the structure of a conventional multiplexing apparatus.
The multiplexing apparatus 800 includes a coding unit 11, a memory 12, a management information generation unit 13 and a multiplexing unit 14.
The coding unit 11 codes the inputted moving picture data Vin and stores the coded data string into the memory 12.
The management information generation unit 13 reads out the coded data from the memory 12 as read out data strOut 1, generates management information base and outputs the management information base to the multiplexing unit 14. Note that the management information base does not include the information concerning frame delays.
The multiplexing unit 14 multiplexes (i) the management information base, (ii) read out data strOut 2 which has been read out from the memory 12, and (iii) addition information adInf such as setting information that is set by a user and that is obtained separately from the stream, and then outputs the multiplexed data MuxDat. Here, addition information adInf may not be used if it is not necessary. Also, the read out data strOut 2 may be packetized using a scheme such as MPEG-2 Transport Streams (TSs) or Program Streams (PSs), or other scheme predetermined by application, and then multiplexed. For example, in the Blu-ray Disc (BD) standard, the read out data strOut 2 is multiplexed using a scheme where 4-byte header is added to MPEG-2 TS packets called Source Packets, and then stored.
FIG. 9A shows the structural example of the multiplexed data outputted from the multiplexing apparatus 800.
As shown in FIG. 9A, management information and one or more coded stream are stored in the multiplexed data. Further, handling each stream as one or more clips makes it possible to realize various playback methods such as digest playback and multi-angle playback. Here, a clip shows one picture or a sequence of pictures in one random access unit RAU or a sequence of random access unit RAUs of the same stream, and the clip and the stream may be the same. FIGS. 9B and 9C show playback examples. Especially, FIG. 9B shows an example of multi-angle playback. In the case where Stream 1 and Stream N respectively store video at a different angle, it is possible to play back Clip N-2 of Stream N by changing angles next to Clip 1-1 of Stream 1 and return to the playback of Stream 1 after completing the playback of Clip N-2. FIG. 9C shows an example of digest playback. It is possible, for example, to play back typical scenes by selectively playing back Clip 1-1 and Clip 1-M in Stream 1.
FIG. 10 is a flow chart showing the operation of a conventional demultiplexing apparatus for demultiplexing the multiplexed data to obtain the coded data and playing back the coded data.
First, in Step S201, the demultiplexing apparatus demultiplexes the multiplexed data to obtain management information, obtains the information concerning the one or more clips to be played back, and then goes to Step 204. The information concerning clips includes start time or end time of the clips, access information used for accessing the coded data in the clips and the like. In Step 204 and Step 205, the demultiplexing apparatus decodes and displays pictures in the clips up to the last pictures in the clips. Here, in the case where an instruction indicating the completion of playback is made by user operation or the like, the playback is completed at the time when the instruction becomes valid.
FIG. 11 is a block diagram showing the structure of a conventional demultiplexing apparatus 900.
The demultiplexing apparatus 900 includes a management information demultiplexing unit 21, a clip information analysis unit 22, a decoding unit 24 and a display unit 26.
The management information demultiplexing unit 21 reads out multiplexed data MuxDat from a multiplexed data recording medium such as an optical disc, analyzes the management information, and determines clips to be played back according to the user instruction or a predetermined method. After that the management information demultiplexing unit 21 outputs, to the clip information analysis unit 22, the clip information Clip that is the information concerning the determined clips.
The clip information analysis unit 22 outputs, to the decoding unit 24, access information are used for accessing the pictures that constitute the clips. On the other hand, the decoding unit 24 reads out the video data Vdat from the multiplexed data recording medium based on the access information acs, decodes the read-out data, and outputs the decoding result decOut to the display unit 26. The display unit 26 displays the decoding results in the display order.
The MPEG-4 AVC allows flexible prediction structures, and thus frame delays of clips are variable. Since a conventional demultiplexing apparatus changes clips without considering frame delays of clips, a gap in a display interval of pictures occurs at the time of changing clips with a different frame delay.
FIGS. 12A to 12C show an example of changing from the clip with one-frame delay to the frame with two-frame delay.
FIG. 12A shows the random access unit RAU1-1 of Stream 1 with one-frame delay, while FIG. 12B shows the random access unit RAU2-1 of Stream 2 with two-frame delay. Here, FIG. 12C shows the timing of decoding and displaying at the time of playing back the RAU2-1 next to the RAU1-1.
Since the frame delay of RAU1-1 is one, at the time when picture P15 that is the last in the decoding order of RAU1-1 is displayed, picture I8 that is the top picture of RAU2-1 is decoded. However, the frame delay of RAU2-1 is two, at the time when picture P2 that is the second in the decoding order is decoded, display of the pictures in the RAU2-1 has yet to be started. Therefore, there is no picture to be displayed at the time when picture P2 is decoded. Consequently, a gap in a display interval occurs between picture P15 and picture B0.
Likewise, in the case of playing back the random access unit RAU1-1 after the random access unit RAU2-1, a gap occurs in the decoding interval to display pictures continuously. In other words, there occurs an overlap in the display interval. A gap in the display order means, hereinafter, discontinuity at a connection that occurs in both cases where frame delay amount at a connection increases and decreases.
As described up to this point, conventional multiplexing and demultiplexing apparatuses have a problem of making a user who watches the moving picture feel uncomfortable because the conventional demultiplexing apparatus cannot display pictures placed at the part at which clips with a different frame delay are changed, maintaining a fixed frame.
The present invention is conceived in order to solve the above-described problem. An object of the present invention is to provide a multiplexing apparatus for multiplexing the coded stream with other information so as to generate multiplexed data and a demultiplexing apparatus for demultiplexing the multiplexed data to play back the coded stream so that they do not make the user feel uncomfortable even at the time of performing any special playback such as multi-angle playback.