1. Technical Field
The present invention relates to a picture coding apparatus which codes a moving picture on a picture-by-picture basis and to a picture decoding apparatus which decodes the coded moving picture coded by the picture coding apparatus, in particular to a picture coding apparatus and picture decoding apparatus corresponding to a trick-play such as high-speed playback (variable-speed playback).
2. Background Art
In the age of multimedia which integrally handles audio, video and other pixel values, existing information media, specifically, newspaper, magazine, television, radio, telephone and the like through which information is conveyed to people, have recently come to be included in the scope of multimedia. Generally, multimedia refers to something that is represented by associating not only characters, but also graphics, sound, and especially images and the like, together, but in order to include the aforementioned existing information media in the scope of multimedia, it becomes a prerequisite to represent such information in a digital form.
However, if the amount of information carried by each of the mentioned information media is estimated as the amount of digital information, while the amount of information for 1 character in the case of text is 1 to 2 bytes, the amount of information required for sound is 64 Kbits per second (telephone quality), and 100 Mbits or over per second becomes necessary for moving pictures (current television receiving quality), it is not realistic for the information media to handle such an enormous amount of information as it is in digital form. For example, although video phones are already in the actual use via Integrated Services Digital Network (ISDN) which offers a transmission speed of 64 Kbit/s to 1.5 Mbit/s, it is impossible to transmit images on televisions and images taken by cameras directly through ISDN.
Accordingly, information compression techniques have become required, and for example, in the case of the video phone, the H.261 and H.263 standards for moving picture compression technology, internationally standardized by the International Telecommunication Union—Telecommunication Standardization Sector (ITU-T), are being employed. Moreover, with MPEG-1 standard information compression techniques, it has also become possible to store video information onto general music compact discs (CD) together with audio information.
Here, Moving Picture Experts Group (MPEG) is an international standard for moving picture signal compression standardized by International Organization for Standardization and International Electrotechnical Commission (ISO/IEC). The MPEG-1 is a standard for compressing moving picture signals up to 1.5 Mbps, in other words, compressing television signals up to approximately a hundredth part. Moreover, since target picture quality within the scope of the MPEG-1 standard is limited to a medium degree of quality which can be realized by a transmission speed of primarily about 1.5 Mbps, the use of MPEG-2, which was standardized to satisfy demands for further improved picture quality, realizes television broadcasting quality with moving picture signals compressed to 2 to 15 Mbps. Furthermore, currently, MPEG-4, which has exceeded MPEG-1 and MPEG-2 compression ratios, and also enables coding, decoding and operating on a per-object base, and realizes the new functions required for the multimedia age, has been standardized by the work group (ISO/IEC JTC1/SC29/WG11) that has promoted the standardization of MPEG-1 and MPEG-2. The MPEG-4 was initially aimed at standardizing a low bit rate coding method. However, currently, this has been expanded to the standardization of a more versatile coding method further including high bit rate coding for interlaced pictures. After that, MPEG-4 Advanced Video Coding (AVC) is standardized as a next generation picture coding method with higher compression ratio by a cooperation of ISO/IEC and ITU-T. It is prospected to be used for next generation optical disc related devices or for a broadcast directed to cell phone terminals.
Generally, in coding of a moving picture, the amount of information is compressed by reducing redundancy in temporal and spatial directions. Accordingly, in an inter-picture prediction coding which aims at reducing the temporal redundancy, a motion estimation and a generation of a predictive picture are performed on a block-by-block basis by referring to a preceding or following picture, and a difference value between the obtained predictive picture and a picture to be coded is coded. Here, a picture indicates a screen: it indicates a frame in a progressive picture; and it indicates a frame or a field in an interlaced picture. Here, the interlaced picture is a picture whose frame is made up of two fields which differ temporally each other. In a coding and decoding of the interlaced picture, it is allowed to process one frame as a frame, to process it as two fields, or to process it as a frame structure or as a field structure on a block-by-block basis in the frame.
An I picture is a picture that is inter-picture prediction coded without referring to a reference picture. Also, a P picture is a picture that is inter-picture prediction coded by only referring to one picture. Further, a B picture is a picture that can be inter-picture prediction coded by referring to two pictures at the same time. The B picture can refer to two pictures as a pair of any pictures which are displayed before or after the B picture. A reference picture can be specified for each block which is a basic unit for coding and decoding. The reference picture which is precedently described in a coded bit stream is distinguished as a first reference picture with the reference picture which is subsequently described as a second reference picture. Note that, as a condition for coding and decoding these pictures, it is necessary that a picture to be referenced has already been coded and decoded.
FIG. 1 is a drawing showing a structure of a stream of the conventional MPEG-2. As shown in FIG. 1, the stream of the MPEG-2 has a hierarchical structure as described in the following. The stream is made up of more than one Groups of Pictures (GOP), and an editing and random accessing of a moving picture are allowed by using the stream as a basic unit for coding. Each GOP is made up of more than one picture. Each picture is one of an I picture, a P picture or a B picture. Each stream, GOP and picture is further made up of synchronous code (sync) which indicates a breakpoint of each unit and a header which is common data in the unit.
FIG. 2A and FIG. 2B are drawings showing an example of a predictive structure among pictures used in MPEG-2.
In the drawings, pictures shown as diagonally shaded area are pictures to be referenced by other pictures. As shown in FIG. 2A, in MPEG-2, P picture (P0, P6, P9, P12, P15) can be prediction coded by referring to an I picture or P picture that is displayed immediately before said P picture. Further, B picture (B1, B2, B4, B5, B7, B8, B10, B11, B13, B14, B16, B17, B19, B20) can be prediction coded by referring to an I picture or P picture that is displayed prior to and following to said B picture Furthermore, arranging order in a stream has been determined as follows: the I pictures and P pictures are arranged in displaying order; and each of the B pictures is arranged immediately after an I picture or P picture that is displayed immediately after said B picture. As a GOP structure, for example, as shown in FIG. 2B, pictures from I3 to B14 can be included in one GOP.
FIG. 3 is a drawing showing a structure of a stream of MPEG-4 AVC. In MPEG-4 AVC, there is no concept equivalent to the GOP. However, by separating data into special picture units by which each picture is decoded without depending on other pictures, it is possible to construct a unit which can be randomly accessed and is equivalent to the GOP. Such separated units are called random access units (RAU).
Next, it is explained about the access unit (hereafter referred to as AU) which is a basic unit for dealing with a stream. An AU is a unit used for storing coded data in one picture, including parameter sets (PS) and slice data. The parameter sets (PS) are divided into a picture parameter set (hereafter simply referred to as PPS) which is data corresponding to a header of each picture and a sequence parameter set (hereafter simply referred to as SPS) which is corresponding to a header of a unit of GOP in MPEG-2 and higher. Note that the PPS and SPS are initialization information necessary for initializing respective decoding.
The SPS includes a profile, a maximum number of pictures available for reference and a picture size and so on as common reference information for decoding all coded pictures in the random access unit (RAU). The PPS includes, for each coded picture in the random access unit (RAU), a type of a variable length coding method, an initial value of quantization step and a number of reference pictures and so on as reference information for decoding the picture. Further, the SPS and PPS can include a quantization matrix so that the PPS can be overwritten with the quantization matrix set in the SPS if necessary. An identifier for identifying which one of the PPS and SPS to refer is added to each picture. Also, slice data includes a frame number FN which is an identification number for identifying a picture. Here, the PPS to be referenced by each picture can be updated on a picture-by-picture basis, while the SPS can be updated only in the IDR picture that is explained later.
For the I pictures of MPEG-4 AVC, there are two types of the I pictures: an Instantaneous Decoder Refresh (IDR) picture; and an I picture which is not the IDR picture. The IDR picture is an I picture which can be decoded without referring to a picture preceding to the IDR picture in decoding order, and is equivalent to a leading I picture of a closed GOP of MPEG-2. For the I picture which is not the IDR picture, a picture which follows said I picture in decoding order may refer to a picture which is preceding to said I picture in decoding order. A structure such as an open GOP of MPEG-2 can be constructed by positioning the I picture that is not an IDR picture in a first access unit of the random access unit RAU and restricting a predictive structure of pictures in the random access unit RAU.
The AU of MPEG-4 AVC can include, in addition to data necessary for decoding a picture, supplemental information called Supplemental Enhancement Information (SEI) which is unnecessary for decoding a picture, boundary information of AU and the like. The data such as parameter set, slice data and SEI are all stored in a Network Abstraction Layer (NAL) unit (NALU). The NAL unit is made up of a header and a payload, and the header includes a field which indicates a type of data stored in the payload (hereafter referred to as NAL unit type). The value of the NAL unit type is defined for each type of data such as a slice and SEI. By referring to the NAL unit type, the type of data stored in the NAL unit can be specified.
The NAL unit of SEI can store one or more SEI messages. The SEI message is also made up of a header and a payload and a type of information stored in the payload is identified by a type of SEI message indicated in the header.
The first AU located at a head of the random access unit RAU includes a NAL unit of the SPS referenced by all AUs of the random access unit RAU and a NAL unit of the PPS referenced by the first AU. Further, the NAL unit of the PPS necessary for decoding each AU of the random access unit RAU is guaranteed to include an AU prior to the current AU, in decoding order, in the current AU or in the random access unit RAU.
Here, there is no information for identifying a NAL unit boundary in a NAL unit so that boundary information can be added to a header of each NAL unit. When a stream of MPEG-4 AVC is used in a MPEG-2 Transport Stream (TS) and a Program Stream (PS), a start code prefix indicated in 3 bytes of 0x000001 is added to the header of the NAL unit. Further, in the MPEG-2 TS and PS, it is determined that a NAL unit called Access Unit Delimiter should be inserted into the header of the AU, which shows an AU boundary.
Various conventional techniques relating to such video coding and decoding have been proposed (e.g. refer to Japanese Laid-Open Patent Publication No. 2003-18549). FIG. 4 is a block diagram showing a picture coding apparatus which realizes a conventional picture coding method.
A picture coding apparatus 191 compresses and codes inputted video picture data Vin, and outputs an AVC stream st that is a coded stream of the MPEG-4 AVC. It includes a slice coding unit 11, a memory 12, an SPS generation unit 13, a new PPS judgement unit 14, a PPS generation unit 16 and an AU determination unit 17.
The video data Vin is inputted to the slice coding unit 11. The slice coding unit 11 codes slice data for one AU, stores slice data Sin that is the result of coding into the memory 12, and outputs SPS information SPSin necessary for decoding the picture to the SPS generation unit 13, while outputting PPS information PPSin necessary for decoding the AU to the new PPS judgement unit 14.
The SPS generation unit 13 generates a SPS based on the SPS information SPSin, and outputs SPSnal including the SPS to the AU determination unit 17.
The new PPS judgement unit 14 holds the PPS information PPSin for each AU in an order starting from the first AU in the random access unit RAU, compares the inputted PPS information PPSin with the held PPS information PPSin. When the inputted PPS information PPSin is new, a new PPS flag fig which indicates that the inputted PPS information PPSin is new to 1, and outputs the PPS information PPSin to the PPS generation unit 16 as PPS information PPSout. On the other hand, when the inputted PPS information PPSin is included in the held PPS information PPSin, the new PPS judgement unit 14 sets the new PPS flag to 0.
The PPS generation unit 16 generates a PPS based on the inputted PPS information PPSout when the new PPS flag is 1, and outputs the data PPSnal including the PPS to the AU determination unit 17.
The AU determination unit 17 generates NAL units of the SPS and PPS respectively based on the data SPSnal and data PPSnal, and generates a NAL unit of slice data by obtaining the slice data Snal from the memory 12. The AU determination unit 17 then determines AU data by arranging the generated NAL units in a predetermined order, constructs an AVC stream st, and outputs the AVC stream st.
FIG. 5 is a flowchart showing an operation of the picture coding apparatus 191. In step S101, the picture coding apparatus 101 codes slice data for one picture, and generates a SPS in step S102. Here, the generation of the SPS may be performed only in the first AU of the random access unit RAU. Following that, the picture coding apparatus 191 judges whether or not the PPS information (PPS) of the AU is new in the random access unit RAU in the step S103. If the PPS information is new (Yes in step S111), the picture coding apparatus 191 determines to store the PPS into an AU, and the operation moves on to step S106 from step S111. If the PPS information is not new (No in step S111), the operation moves on to step S107. In step S106, the picture coding apparatus 191 generates the PPS. In step S107, when it is judged that the PPS information is new and the PPS is stored into the AU in step S111, the picture coding apparatus 191 includes the PPS generated in step S106 in the AU, generates data for one AU, and outputs the generated data.
FIG. 6 is a block diagram showing a picture decoding apparatus which realizes a conventional picture decoding method.
The picture decoding apparatus 291 separates and decodes an AU from the inputted AVC stream st, and outputs decoded data Dout which is a decoded picture. It includes an AU boundary detection unit 22, a PPS obtainment unit 23, a PPS memory 24, a decoding information obtainment unit 25 and a decoding unit 26.
The AU boundary detection unit 22 detects a boundary of an AU and separates the AU data. When a NAL unit of the PPS is included in the AU data, it outputs the NAL unit of the PPS PPSnal to the PPS obtainment unit 23, and outputs other NAL units Dnal to the decoding information obtainment unit 25.
The PPS obtainment unit 23 analyzes a NAL unit PPSnal, and let the PPS memory 24 hold the analysis result as analysis result signal PPSst. The decoding information obtainment unit 25 analyzes the NAL unit Dnal, and obtains SPS, slice data and the like, while obtaining data PPSref including the PPS referenced by the AU from the PPS memory 24, and outputs the slice data and the SPS and PPS necessary for decoding the slice data to the decoding unit 26 as pre-decoded data Din.
The decoding unit 26 decodes the slice data based on the pre-decoded data Din, and outputs the decoded data Dout.
By the way, the random access unit RAU is a data structure which indicates that decoding can be performed from the first AU, and necessary for realizing trick-play such as jump-in playback, variable-speed playback and reverse playback or for realizing skip playback on a random access unit-by-unit basis in a storing device having an optical disc and a hard disc.
However, in a random access unit RAU in a stream of the conventional MPEG-4 AVC, a PPS necessary for decoding an AU had not been able to be obtained in the case where high-speed playback is performed by selecting, decoding and displaying a specific AU such as an AU of an I picture or a P picture.
FIG. 7A and FIG. 7B show a structural example of a random access unit RAU.
As shown in FIG. 7A, the random access unit RAU is made up of fifteen AUs from AU 1 to AU 15. At the time of high-speed playback, five AUs of AU1, AU4, AU7, AU10 and AU13 are decoded and displayed. Here, the AU1 to AU8 refer to PPS#1 as a PPS, and the AU9 to AU15 refer to PPS#2. The PPS#1 and PPS#2 are respectively stored in AU1 and AU9. Herein, as shown in FIG. 7B, the AU to be decoded at the time of high-speed playback does not include AU9 and the PPS#2 cannot be obtained at the time of high-speed playback so that AU10 and AU13 cannot be decoded.
Thus, when an AU in the random access unit RAU is selectively decoded and displayed, the necessary PPS cannot be obtained if only predetermined AU is decoded as in MPEG-2. Therefore, there is a problem that all AUs in the random access unit RAU need to be analyzed in order to obtain the PPS.
In order to solve the problem, an object of the present invention is to provide a picture coding apparatus which generates a stream so as to decode a picture by obtaining an appropriate picture parameter set necessary for the decoding, and a picture decoding apparatus which decodes the generated stream.