1. Field of Invention
The present invention relates to a moving image encoding method, an image encoding apparatus, and data for encoding a moving image while switching between variable-length encoding schemes.
2. Description of the Related Art
The following describes a DVD-Video disc (hereinafter simply referred to as a “DVD”) of a conventional technology.
FIG. 1 is a diagram showing the structure of a DVD. As shown in the bottom of FIG. 1, the DVD disc includes a logical address space in between the lead-in area and the lead-out area. In the logical address space, volume information of the file system is stored at the top, and application data such as video and audio is stored in the subsequent areas.
The file system, which is a file system compliant with ISO9660 and the Universal Disc Format (UDF), is a mechanism for representing data on a disc by units called directories and files. Even in a personal computer (PC) for everyday use, data stored in the hard disk in the form of directories and files are represented on the computer via a file system called FAT or NTFS, as a result of which usability is enhanced.
Both UDF and ISO9660 (which are sometimes referred to collectively as “UDF Bridge”) are used in DVDs, and data can be read out by the file system driver of any of UDF and ISO9660. In the case of DVD-RAM/R/RW, which are rewritable DVD discs, data reading, writing, and deletion are physically possible via these file systems.
Data stored on a DVD can be viewed, via the UDF bridge, as directories or files as shown in the upper left of FIG. 1. Immediately below the root directory (“ROOT” in FIG. 1), a directory called “VIDEO_TS” is placed, where application data of the DVD is stored. The application data is stored as plural files. The following are some of the major files:                VIDEO_TS. IFO disc reproduction control information file        VTS—01—0. IFO video title set#1 reproduction control information file        VTS—01—0. VOB video title set#1 stream file        . . .        
There are two types of extensions specified. “IFO” indicates that the corresponding file stores reproduction control information. “VOB” indicates that the corresponding file stores an MPEG stream being AV data. The reproduction control information is information that includes information for realizing interactivity (technique for dynamically changing the state of reproduction according to a user operation) employed for the DVD as well as information, such as meta data, which is attached to a title or an AV stream. The reproduction control information of the DVD is called navigation information in general.
The reproduction control information files include “VIDEO_TS. IFO” intended for the management of the entire disc, and “VTS—01—0. IFO” being the reproduction control information of an individual video title set (a single DVD disc can store plural titles, that is, different movies and movies with different versions). “01” in the body of the filename indicates the number of the video title set. In the case where the number of a video title set is #2, for example, the filename is “VTS—02—0. IFO”.
The upper right of FIG. 1 shows a DVD navigation space in the application layer of the DVD, i.e., a logical structure space where the above-described reproduction control information is shown. Information in “VIDEO_TS. IFO” is shown in the DVD navigation space as Video Manager Information (VMGI). Reproduction control information which exists for each “VTS—01—0. IFO” or for each video title set, is shown in the DVD navigation space as Video Title Set Information (VTSI).
VTSI describes Program Chain Information (PGCI) which is information about a reproduction sequence called a Program Chain (PGC). The PGCI is made up of a group of cells and a kind of programming information called a command. Each cell represents a part or the whole segments in a VOB (which is an abbreviation of Video Object and which includes an MPEG stream). The reproduction of a cell means to reproduce segments in the VOB that are specified by such cell.
A command, which is processed by a DVD-capable virtual machine, is similar to Java (registered trademark) Script executed on a browser. However, a DVD command is different from a Java (registered trademark) Script in that, while Java (registered trademark) Script performs window and browser controls (e.g., opens a new browser window) in addition to logical operations, a DVD command performs only the reproduction control of AV titles, such as the specification of a chapter to be reproduced, in addition to logical operations.
Each cell includes, as its internal information, the start address and end address (logical storage address on the disc) of a VOB stored on the disc. A player reads out data using such information described in the cell about the start address and end address of the VOB, and reproduces the read data.
FIG. 2 is a schematic diagram for describing the navigation information embedded in the AV stream. Interactivity, which is characteristics to a DVD, is not realized only by the navigation information stored in the above-described “VIDEO_TS. IFO” and “VTS—01—0. IFO”; several pieces of important information are multiplexed in the VOB together with video data and audio data, using dedicated carriers called navigation packs (hereinafter referred to as navi pack(s) or NV_PCK).
Here, a description is given of a menu as a simple example of interactivity. Several buttons appear on the menu screen. For each of such buttons, a process to be performed when such button is selected and activated, is defined. One button is selected on the menu (the fact that the button is selected is indicated to the user by a semitransparent color overlaid on such button in a highlighted manner). The user can shift to any of the buttons located above, below, right or left of the currently selected button, using the Up/Down/Right/Left key on the remote control. Using the Up/Down/Right/Left key on the remote control, the user moves the highlight to a button such user wishes to select and activate, and then determines (presses the Determination key). Accordingly, a program of the corresponding command is activated. In general, the reproduction of the corresponding title or chapter is activated by the command.
The upper left of FIG. 2 shows an overview of the control information stored in NV_PCK.
NV_PCK includes highlight color information and button information of each button. The highlight color information describes color palette information, which specifies a semitransparent color of a highlight to be overlaid. Each button information describes: rectangular area information that is information about the position of each button; shift information indicating a move from one button to another button (specification of a destination button corresponding to a user selection of the Up/Down/Right/Left key); and button command information (a command to be executed when such button is selected).
As shown in the upper right center of FIG. 2, a highlight on the menu is generated as an overlay image. The overlay image is an image generated by giving a color specified by the color palette information to the rectangular area information in the button information. Such overlay image is displayed on the screen, superimposed on the background image shown in the right of FIG. 2.
The menu of the DVD is realized in the above-described manner. The reason that a part of the navigation data is embedded in the stream using NV_PCK is to allow the menu information to be dynamically updated in synchronization with the stream (e.g., to allow the menu to be displayed only for five to ten minutes in the middle of movie reproduction), and to realize the menu of the DVD without any problems even for an application which is likely to have a problem of synchronization timing. Another major reason is to improve user operability by, for example, storing, in NV_PCK, information for supporting special reproduction, so as to smoothly decode and reproduce AV data even when a DVD is reproduced in a special manner such as fast-forward reproduction and rewind reproduction.
FIG. 3 is a conceptual diagram showing a VOB being a DVD stream. As shown in the drawing, data such as video, audio, and subtitles (as shown in A) are each packetized and packed (as shown in B), based on the MPEG system standard (ISO/IEC13818-1), and multiplexed to be a single MPEG program stream (as shown in C). NV_PCK including a button command for realizing interactivity as described above is multiplexed together.
Multiplexing in the MPEG system is characterized in that, while each data to be multiplexed forms a bit string based on its decoding order, data to be multiplexed, i.e., video data, audio data, and subtitle data do not necessarily form a bit string in order of reproduction. This is attributable to the fact that a decoder model for a multiplexed MPEG system stream (generally referred to as a System Target Decoder or an STD (shown in D in FIG. 3) has decoder buffers corresponding to the respective elementary streams obtained by demultiplexing the multiplexed data, and such demultiplexed data are temporarily stored in the respective decoder buffers until the time of decoding. The size of decoder buffers specified by the DVD-Video standard differs on an elementary stream basis. The size of the buffer for video data is 232 KB, the size of the buffer for audio data is 4 KB, and the size of the buffer for subtitle data is 52 KB.
In other words, the subtitle data that is multiplexed together with the video data is not necessarily decoded or reproduced at the same timing.
Meanwhile, there is the Blu-ray Disc (BD) standard as a next-generation DVD standard.
While a DVD is intended for the package distribution of video with standard image quality (standard-definition image quality) as well as the recording of analog broadcasting (the DVD Video Recording format), a BD is capable of recording digital broadcasting with high-definition image quality as it is (the Blu-ray Disc Rewritable format; hereinafter referred to as the BD-RE).
However, since the BD-RE format widely supports the recording of digital broadcasting, information that supports special reproduction or the like is not optimized. Considering that high-definition video will be distributed in the future by means of package distribution at the rate higher than that for digital broadcasting (the BD-ROM format), there will be the need for a mechanism that stresses out a user even at the time of special reproduction.
One of the schemes employed for the encoding of a moving image on a BD is MPEG-4 AVC (Advanced Video Coding). MPEG-4 AVC is a next-generation encoding scheme with a high compression ratio which has been jointly developed by ISO/IEC (International Organization for Standardization/International Electrotechnical Commission) JTC1/SC29/WG11 and ITU-T (International Telecommunication Union-Telecommunication Standardization Sector).
In general, in encoding of a moving image, the amount of information is compressed by reducing redundancies in temporal and spatial directions. Therefore, in inter-picture predictive encoding aiming at reducing temporal redundancies, motion estimation and generation of a predictive image are carried out on a block-by-block basis with reference to forward or backward picture(s), and encoding is then performed on the difference value between the obtained predictive image and an image in the current picture to be encoded. Here, “picture” is a term denoting one screenful of image. In the case of a progressive image, a picture means a frame, whereas it means a frame or a field in the case of an interlaced image. Here, “interlaced image” is an image of a frame composed of two fields which are separated in capture time. In encoding and decoding of an interlaced image, it is possible to handle one frame as a frame as it is, as two fields, or in a frame structure or a field structure on a per-block basis within the frame.
A picture to be encoded using intra-picture prediction without reference to any reference images shall be referred to as an I picture. A picture to be encoded using inter-picture prediction with reference to only one reference picture shall be referred to as a P picture. A picture to be encoded using inter-picture prediction with reference to two reference pictures at the same time shall be referred to as a B picture. It is possible for a B picture to refer to two pictures which can be arbitrarily combined from forward/backward pictures in display time. Reference images (reference pictures) can be designated for each block serving as a basic unit of encoding and decoding. Distinction shall be made between such reference pictures by calling a reference picture to be described earlier in an encoded bitstream a first reference picture, and by calling a reference picture to be described later in the bitstream a second reference picture. Note that as a condition for encoding and decoding these types of pictures, pictures used for reference need to be already encoded and decoded.
A residual signal, which is obtained by subtracting, from an image to be encoded, a prediction signal obtained through intra-picture prediction or inter-picture prediction, is quantized through frequency transformation and then variable-length encoded to be outputted as an encoded stream. MPEG-4 AVC supports two kinds of variable-length encoding schemes, which can be switched on a picture-by-picture basis: Context-Adaptive Variable-length coding (CAVLC) and Context-Adaptive Binary Arithmetic Coding (CABAC). Context-adaptive scheme allows to adaptively select an efficient encoding scheme according to the surrounding situations.
FIG. 4 shows an example of variable-length encoding schemes to be applied to pictures that make up a randomly accessible unit in an MPEG-4 AVC stream. MPEG-4 AVC has no concept of a group of pictures (GOP) of the MPEG-2 Video standard. However, since it is possible to construct a randomly accessible unit corresponding to a GOP by dividing data into special picture units that can be decoded independently of other pictures, such an unit is referred to as a random access unit (RAU) here. As shown in FIG. 4, whether to apply CABAC or CAVLC as a variable-length encoding scheme is switched on a picture-by-picture basis.
Next, referring to FIGS. 5A to 5C, descriptions are given of the respective variable-length decoding processes of CABAC and CAVLC, which are different in processes at the time of variable-length decoding. FIG. 5A is a block diagram showing an image decoding apparatus that performs Context-Adaptive Binary Arithmetic Decoding (CABAD), which is decoding processing for data variable-length encoded through CABAC and that performs Context-Adaptive Variable-length decoding (CAVLD), which is decoding processing for data encoded through CAVLC.
Image decoding processing with CABAD is performed in the following manner: first, encoded data Vin applied with CABAC is inputted to a stream buffer 5001; next, an arithmetic decoding unit 502 reads encoded data Vr from the stream buffer to perform arithmetic decoding on it, and inputs binary data Bin1 to a binary data buffer 5003; a binary data decoding processing unit 5004 obtains binary data Bin2 from the binary data buffer 5003 to decode such binary data, and inputs the resulting decoded data Din1 to a pixel reconstruction unit 5005; and the pixel reconstruction unit 5005 performs inverse-quantization, inverse-transformation, motion compensation, and the like on the decoded binary data Din1 so as to reconstruct pixels, and outputs decoded data Vout.
FIG. 5B is a flowchart illustrating operations to be performed from when the decoding of encoded data applied with CABAC starts to when pixel reconstruction processing is performed. First, in Step 5001, the encoded data Vin applied with CABAC is arithmetic-decoded, so as to generate binary data. Next, in Step 5002, it is determined whether or not binary data equivalent to a predetermined data unit, such as one or more pictures, is ready. In the case where such binary data is ready, the process proceeds to Step S5003, whereas in the case where such binary data is not ready, the process of Step S5002 is repeated. The reason for buffering the binary data here is that CABAC sometimes results in a significant increase in the number of bits of binary data per picture or per macroblock, which consequently results in a significant increase also in processing load in arithmetic decoding. Thus, in order to achieve continuous reproduction without interruption even in the worst case scenario, it is necessary to perform a certain amount of arithmetic decoding processing in advance. In Step S5003, binary data is decoded, and in Step S5004, the pixel reconstruction processing is performed. In CABAD, as described above, a delay occurs at the start of decoding since it is not possible to start the pixel reconstruction processing until binary data equivalent to a predetermined data unit is ready in Step S5001 and Step S5002.
Image decoding processing with CAVLD is performed in the following manner: first, encoded data Vin applied with CAVLC is inputted to the stream buffer 5001; next, a CAVLD unit 5006 performs variable-length decoding on it, and inputs the resulting decoded VLD data Din2 into the pixel reconstruction unit 5005; the pixel reconstruction unit 5005 performs inverse-quantization, inverse-transformation, motion compensation, and the like so as to reconstruct pixels, and outputs decoded data Vout. FIG. 5C is a flowchart illustrating operations to be performed from when the decoding of encoded data applied with CAVLC starts to when pixel reconstruction processing is performed. First, in Step S5005, CAVLD is performed. Then, in Step S5004, the pixel reconstruction processing is performed. As described above, unlike CABAD, it is not necessary in CAVLD to wait for data equivalent to a predetermined data unit to be ready before the start of the pixel reconstruction processing or to have an intermediate buffer for variable-length decoding processing, such as the binary data buffer 5003.
FIG. 6 is a flowchart illustrating operations performed by a conventional decoding apparatus that decodes a stream in which variable-length encoding schemes are switched in the middle of the stream as in an example case of FIG. 4. First, in Step S5101, information indicating a variable-length encoding scheme applied to a picture is obtained, and the process proceeds to Step S5102. In Step S5102, it is determined whether or not the variable-length encoding scheme for the current picture is switched from that for the previous picture in decoding order. Methods of buffer management at the time of variable-length decoding processing are different between CABAD and CAVLD. Thus, in the case where the variable-length encoding schemes have been switched, the process proceeds to Step S5103 to perform a process of switching between buffer management methods, whereas in the case where the variable-length encoding scheme has not been switched, the process proceeds to Step S5104. In Step S5104, it is determined whether or not the variable-length encoding scheme is CAVLC. In the case where the variable-length encoding scheme is CAVLC, the process proceeds to Step S5105 to perform CAVLD processing, whereas in the case where the variable-length encoding scheme is CABAC, the process proceeds to Step S5106. In Step S5106, it is determined whether or not the variable-length encoding scheme for the current picture is switched from that for the previous picture in decoding order. In the case where the variable-length encoding schemes have been switched, the process proceeds to Step S5107, where arithmetic decoding is repeated until binary data equivalent to a predetermined data unit is ready, and such binary data is decoded, as shown in Step S5001 and Step S5002 in FIG. 5B. In the case where it is determined in Step S5106 that the variable-length encoding scheme has not been switched, the process proceeds to Step S5108 to perform regular CABAD processing. The regular CABAD processing here refers to processing that does not involve the buffering of binary data that is needed at the time when CAVLC is switched to CABAC or when the decoding of a stream applied with CABAC starts. Finally, in Step S5109, pixel reconstruction processing is performed.    Patent Document 1: Japanese Laid-Open Patent Application No. 2000-228656.    Non-patent Document 1: Proposed SMPTE Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process, Final Committee Draft 1 Revision 6, Jul. 13, 2005.