1. Technical Field
The present invention relates to a stream generation apparatus which generates a stream including coded pictures, a stream generation method, a picture coding apparatus, a picture coding method, a recording medium and a program thereof.
2. Background Art
In the age of multimedia which integrally handles audio, video and other pixel values, existing information media, specifically, newspaper, magazine, television, radio, telephone and the like through which information is conveyed to people, have recently come to be included in the scope of multimedia. Generally, multimedia refers to something that is represented by associating not only characters, but also graphics, sound, and especially images and the like, together, but in order to include the aforementioned existing information media in the scope of multimedia, it becomes a prerequisite to represent such information in a digital form.
However, if the amount of information carried by each of the mentioned information media is estimated as the amount of digital information, while the amount of information for 1 character in the case of text is 1 to 2 bytes, the amount of information required for sound is 64 Kbits per second (telephone quality), and 100 Mbits or over per second becomes necessary for moving pictures (current television receiving quality), it is not realistic for the information media to handle such an enormous amount of information as it is in digital form. For example, although video phones are already in the actual use via Integrated Services Digital Network (ISDN) which offers a transmission speed of 64 Kbit/s to 1.5 Mbit/s, it is impossible to transmit images on televisions and images taken by cameras directly through ISDN.
Accordingly, information compression techniques have become required, and for example, in the case of the video phone, the H.261 and H.263 standards for moving picture compression technology, internationally standardized by the International Telecommunication Union—Telecommunication Standardization Sector (ITU-T), are being employed. Moreover, with MPEG-1 standard information compression techniques, it has also become possible to store video information onto general music compact discs (CD) together with audio information.
Here, Moving Picture Experts Group (MPEG) is an international standard for moving picture signal compression standardized by International Organization for Standardization and International Electrotechnical Commission (ISO/IEC). The MPEG-1 is a standard for compressing moving picture signals up to 1.5 Mbps, in other words, compressing television signals up to approximately a hundredth part. Moreover, since target picture quality within the scope of the MPEG-1 standard is limited to a medium degree of quality which can be realized by a transmission speed of primarily about 1.5 Mbps, the use of MPEG-2, which was standardized to satisfy demands for further improved picture quality, realizes television broadcasting quality with moving picture signals compressed to 2 to 15 Mbps. Furthermore, currently, MPEG-4, which has exceeded MPEG-1 and MPEG-2 compression ratios, and also enables coding, decoding and operating on a per-object base, and realizes the new functions required for the multimedia age, has been standardized by the work group (ISO/IEC JTC1/SC29/WG11) that has promoted the standardization of MPEG-1 and MPEG-2. The MPEG-4 was initially aimed at standardizing a low bit rate coding method. However, currently, this has been expanded to the standardization of a more versatile coding method further including high bit rate coding for interlaced pictures. After that, MPEG-4 Advanced Video Coding (AVC) is standardized as a next generation picture coding method with higher compression ratio by a cooperation of ISO/IEC and ITU-T. It is prospected to be used for next generation optical disc related devices or for a broadcast directed to cell phone terminals.
Generally, in coding of a moving picture, the amount of information is compressed by reducing redundancy in temporal and spatial directions. Accordingly, in an inter-picture prediction coding which aims at reducing the temporal redundancy, a motion estimation and a generation of a predictive picture are performed on a block-by-block basis by referring to a preceding or following picture, and a difference value between the obtained predictive picture and a picture to be coded is coded. Here, a picture indicates a screen: it indicates a frame in a progressive picture; and it indicates a frame or a field in an interlaced picture. Here, the interlaced picture is a picture whose frame is made up of two fields which differ temporally each other. In a coding and decoding of the interlaced picture, it is allowed to process one frame as a frame, to process it as two fields, or to process it as a frame structure or as a field structure on a block-by-block basis in the frame.
An I picture is a picture that is intra coded without referring to a reference picture. Also, a P picture is a picture that is inter-picture prediction coded by only referring to one picture. Further, a B picture is a picture that can be inter-picture prediction coded by referring to two pictures at the same time. The B picture can refer to two pictures as a pair of any pictures which are displayed before or after the B picture. A reference picture can be specified for each block which is a basic unit for coding and decoding. The reference picture which is precedently described in a coded bit stream is distinguished as a first reference picture with the reference picture which is subsequently described as a second reference picture. Note that, as a condition for coding and decoding these pictures, it is necessary that a picture to be referred has already been coded and decoded.
FIG. 1 is a drawing showing a structure of a stream of the conventional MPEG-2. As shown in FIG. 1, the stream of the MPEG-2 has a hierarchical structure as described in the following. The stream is made up of more than one Groups of Pictures (GOP), and an editing and random accessing of a moving picture are allowed by using the stream as a basic unit for coding. Each GOP is made up of more than one picture. Each picture is one of an I picture, a P picture or a B picture. Each stream, GOP and picture is further made up of synchronous code (sync) which indicates a breakpoint of each unit and a header which is common data in the unit.
FIG. 2A and FIG. 2B are drawings showing an example of a predictive structure among pictures used in MPEG-2. In the drawings, pictures shown as diagonally shaded area are pictures to be referred by other pictures. As shown in FIG. 2A, in MPEG-2, P picture (P0, P6, P9, P12, P15) can be prediction coded by referring to an I picture or P picture that is displayed immediately before the P picture. Further, B picture (B1, B2, B4, B5, B7, B8, B10, B11, B13, B14, B16, B17, B19, B20) can be prediction coded by referring to an I picture or P picture that is displayed prior to and following to the B picture Furthermore, arranging order in a stream has been determined as follows: the I pictures and P pictures are arranged in displaying order; and each of the B pictures is arranged immediately after an I picture or P picture that is displayed immediately after the B picture. As a GOP structure, for example, as shown in FIG. 2B, pictures from I3 to B14 can be included in one GOP.
FIG. 3 is a drawing showing a structure of a stream of MPEG-4 AVC. In MPEG-4 AVC, there is no concept equivalent to the GOP. Therefore, in the case where an arrangement method of parameter sets that are described later and predictive structure of pictures are not constrained, it is necessary to search a picture whose picture data is sequentially analyzed and can be decoded when randomly accessed. However, by separating data into special picture units by which each picture is decoded without depending on other pictures, it is possible to construct a unit which can be randomly accessed and is equivalent to the GOP. Such separated units are called random access units (RAU) and a stream which is made up of RAUs is called a stream having a random access structure.
Here, it is explained about the access unit (hereafter referred to as AU) which is a basic unit for dealing with a stream. An AU is a unit used for storing coded data in one picture, including parameter sets and slice data. The parameter sets are divided into a picture parameter set (PPS) which is data corresponding to a header of each picture and a sequence parameter set (SPS) which is corresponding to a header of a unit of GOP in MPEG-2 and more. The SPS includes a maximum number of pictures available for reference, picture size and the like. The PPS includes a variable length coding method, an initial value of quantization step, and a number of reference pictures. An identifier indicating which one of the PPS and SPS is referred is attached to each picture.
For the I pictures of MPEG-4 AVC, there are two types of the I pictures: an Instantaneous Decoder Refresh (IDR) picture; and an I picture which is not the IDR picture. The IDR picture is an I picture which can be decoded without referring to a picture preceding to the IDR picture in decoding order, that is, whose condition necessary for decoding is reset, and is equivalent to a leading I picture of a closed GOP of MPEG-2. For the I picture which is not the IDR picture, a picture which follows the I picture in decoding order may refer to a picture which is preceding to the I picture in decoding order. Here, the IDR picture and I picture indicate pictures made up of only I slices. The P picture indicates a picture made up of P slices or I slices. The B picture indicates a picture made up of B slices, P slices or I slices. Note that the slices of the IDR picture and the slices of the non-IDR picture are stored in different types of NAL units.
The AU of MPEG-4 AVC can include, in addition to data necessary for decoding a picture, supplemental information called Supplemental Enhancement Information (SEI) which is unnecessary for decoding a picture, boundary information of AU and the like. The data such as parameter set, slice data and SEI are all stored in a Network Abstraction Layer (NAL) unit (NALU). The NAL unit is made up of a header and a payload, and the header includes a field which indicates a type of data stored in the payload (hereafter referred to as NAL unit type). The value of the NAL unit type is defined for each type of data such as a slice and SEI. By referring to the NAL unit type, the type of data stored in the NAL unit can be specified. The NAL unit of SEI can store one or more SEI messages. The SEI message is also made up of a header and a payload and a type of information stored in the payload is identified by a type of SEI message indicated in the header.
FIG. 4 is a drawing showing an example of a predictive structure of the MPEG-4 AVC. In MPEG-4 AVC, an AU of P picture can refer to an AU of B picture. As shown in FIG. 4, the AU of P picture (P7) can refer to the AU of B picture (B2). Herein, in order to perform high-speed playback by displaying only the AUs of I pictures and P pictures, I0, B2, P4 and P7 have to be decoded. Thus, when trick-play such as jump-in playback, variable-speed playback or reverse playback is performed, the AUs necessary to be decoded cannot be determined in advance so that all AUs need to be decoded in the end. However, by storing, in a stream, supplemental information indicating AUs necessary to be decoded for the trick-play, the AUs to be decoded by referring to the supplemental information can be determined. Such supplemental information is called trick-play information. Further, if a constrain is previously set in a predictive structure such as that the AUs of P pictures do not refer to an AU of B picture, only the AUs of the I pictures and P pictures can be decoded and displayed. Furthermore, for the AUs of I pictures and P pictures, the AUs of I pictures and P pictures can be sequentially decoded and displayed if the decoding order is same as the displaying order.
FIG. 5 is a block diagram showing a structure of a conventional multiplexer.
A multiplexer 17 is a multiplexer which receives a video data, codes the inputted video data into streams of MPEG-4 AVC, generates database information about the coded data, multiplexes and records the coded data and the database information. It includes a stream attribute determination unit 11, a coding unit 12, a database information generation unit 13 having a general database information generation unit 14, a multiplexing unit 15 and a recording unit 16.
The stream attribute determination unit 11 determines a coding parameter for coding the MPEG-4 AVC and a constrained matter relating to a trick-play, and outputs them to the coding unit 12 as attribute information TYPE. Here, the constrained matter relating to the trick-play includes information about whether or not to apply a constraint for constructing a random access unit to a stream of the MPEG-4 AVC, whether or not to include information indicating an AU to be decoded when variable speed playback or reverse playback is performed, or whether or not to give a constrain on a predictive structure among AUs. The coding unit 12, based on the attribute information TYPE, codes the inputted video data into a stream of the MPEG-4 AVC, and outputs the access information in the stream to a general database information generation unit 14 while outputting the coded data to the multiplexing unit 15. Here, the access information indicates information on an access basis which is a basic unit for accessing to a stream, including a start address, size, displayed time and the like of a leading AU in the access basis. The stream attribute determination unit 11 further outputs information necessary for generating database information such as a compression method and a resolution as general database information to the general database information generation unit 14. The database information generation unit 13 generates database information, and is made up solely of the general database information generation unit 14. The general database information generation unit 14 generates, with the access information and the general database information, a table data to be referred when accessing to a stream and a table data in which attribute information such as a compression method are stored, and outputs the generated table data to the multiplexing unit 15 as database information INFO. The multiplexing unit 15 generates multiplexed data by multiplexing the coded data and the database information INFO, and outputs the multiplexed data to the recording unit 16. The recording unit 16 records the multiplexed data inputted from the multiplexing unit 15 into an optical disc, a hard disc or a recording medium such as a memory.
FIG. 6 is a block diagram showing a structure of a conventional demultiplexer.
A demultiplexer 27 is a demultiplexer which, in accordance with an externally inputted command which instructs to perform trick-play, separates, decodes and displays the AU data of MPEG-4 AVC from the optical disc on which a stream of the MPEG-4 AVC is recorded together with the database information. It includes a database information analyzing unit 21, a decoding/displaying AU determination unit 23, an AU separation unit 24, a decoding unit 25, and a displaying unit 26.
The database information analyzing unit 21 is made up solely of the general database information analyzing unit 22. A trick-play instruction signal for instructing to perform trick-play such as variable speed playback, reverse playback or jump-in playback is inputted to the general database information analyzing unit 22. When the trick-play instruction signal is inputted, the general database information analyzing unit 22 analyzes the inputted signal by obtaining access information ACS from the database information of the multiplexed data, obtains access destination information including address information of an access basis in which an AU which is to be decoded or displayed is included and the like, and notifies the AU separation unit 24. The AU separation unit 24 analyzes AUs which make up an access basis, obtains the trick-play information TRK about an AU to be decoded and displayed, and outputs the obtained information to the decoding/displaying AU determination unit. The decoding/displaying AU determination unit determines an AU to be decoded and displayed based on a predetermined rule, and notifies the identification information of the AU to be decoded and the identification information of the AU to be displayed respectively to the AU separation unit 24 and the displaying unit 26. The AU separation unit 24 separates the data in the AU to be decoded based on the access destination information, and outputs the separated data to the decoding unit 25. The decoding unit 25 decodes the inputted AU data, and outputs the decoded data to the displaying unit 25. Finally, the displaying unit 26 selects an AU which is indicated to be displayed in the display AU information, and displays the selected AU. (Refer to Japanese Laid-Open Patent Publication No. 2003-18549).