The following describes a DVD-Video disc (hereafter simply referred to as a “DVD”) as a conventional technique.
FIG. 1 shows a structure of a DVD. As shown in the lower part of FIG. 1, a logical address space is provided between a lead-in area and a lead-out area on a DVD disc. Volume information of a file system is stored at the start of the logical address space, followed by application data such as video and audio.
The file system is defined by ISO 9660 or Universal Disc Format (UDF), and is a mechanism for representing data on a disc in units called a directory and a file. A personal computer (PC) for daily use, too, can present data stored on a hard disk in the form of directories and files, through a file system such as FAT or NTFS. This enhances usability.
DVDs use both UDF and ISO 9660 (a combination of which is known as “UDF Bridge”), so that data can be read by a file system driver of any of UDF and ISO 9660. In the case of DVD-RAM/R/RW which are rewritable DVD discs, data can be physically read, written, and deleted through these file systems.
Data recorded on the DVD can be viewed, through UDF Bridge, as directories and files as shown in the upper left part of FIG. 1. A directory called “VIDEO_TS” is placed immediately below a root directory (“ROOT” in FIG. 1). The application data of the DVD is stored in this VIDEO_TS directory. The application data is stored as a plurality of files. The plurality of files mainly include the following:
VIDEO_TS.IFO disc playback control information file                VTS—01—0.IFO video title set #1 playback control information file        VTS—01—0.VOB video title set #1 stream file        . . .        
Two types of extensions are specified. “IFO” indicates a file storing playback control information, and “VOB” indicates a file storing an MPEG stream that is AV data. Playback control information includes information for realizing interactivity (a technique of dynamically changing playback in accordance with a user operation) employed for DVDs, information such as metadata that is attached to a title or an AV stream, and the like. In DVDs, playback control information is also referred to as navigation information.
Playback control information files include “VIDEO_TS.IFO” for managing the entire disc, and “VTS—01—0.IFO” that is playback control information for an individual video title set (in DVDs, a plurality of titles, e.g., different movies or different versions of a movie, can be recorded on a single disc). “01” in the body of the file name “VTS—01—0.IFO” indicates a number of the video title set. For example, a playback control information file for video title set #2 is “VTS—02—0.IFO”.
The upper right part of FIG. 1 shows a DVD navigation space in an application layer of the DVD. This is a logical structure space where the aforementioned playback control information is developed. In the DVD navigation space, the information stored in “VIDEO_TS.IFO” is developed as video manager information (VMGI), and the playback control information that exists for each individual video title set, such as “VTS—01—0.IFO”, is developed as video title set information (VTSI).
The VTSI includes program chain information (PGCI). The PGCI is information about a playback sequence called a program chain (PGC). The PGCI is mainly composed of a group of cells and a kind of programming information called commands. A cell itself corresponds to all or part of a video object (which is an MPEG stream and is abbreviated as VOB). Playing a cell is an equivalent of playing a section, in a VOB, that is designated by the cell.
Commands are processed by a DVD virtual machine, and are similar to, for example, Java (registered trademark) Script executed on a browser. However, DVD commands differ from Java (registered trademark) Script in the following point. Java (registered trademark) Script controls windows and browsers (e.g. open a window of a new browser), in addition to performing logical operations. On the other hand, DVD commands only control playback of AV titles, such as by designating a chapter to be played, in addition to performing logical operations.
A cell includes start and end addresses of a corresponding section in a VOB recorded on the disc (logical storage addresses on the disc), as its internal information. A player reads data using the start and end addresses written in the cell with regard to the VOB, and plays the read data.
FIG. 2 is a schematic view for explaining navigation information that is embedded in an AV stream. Interactivity, which is featured by DVDs, is realized not only by navigation information stored in the aforementioned “VIDEO_TS.IFO” and “VTS—01—0.IFO”. Other important navigation information is multiplexed in a VOB together with video and audio data, by using a dedicated carrier called a navigation pack (hereafter referred to as a navi pack or NV_PCK).
A menu is explained below as a simple example of interactivity. A menu screen has several buttons. For each of the buttons, a process to be performed when the button is selected and executed is defined. One button is in a selected state on the menu (the selected button is highlighted by being overlaid with a semitransparent color, to indicate the selected state of the button to a user). The user can move the highlight to any of buttons located above, below, right, and left of the currently selected button, by using Up/Down/Right/Left keys on a remote control. When the user moves the highlight to a button which the user wants to select for execution using the Up/Down/Right/Left keys on the remote control and determines the selection (e.g. by pressing an Enter key), a program of a command corresponding to the selected button is executed. Typically, playback of a title or a chapter corresponding to the selected button is executed by the command.
The upper left part of FIG. 2 roughly shows control information stored in the NV_PCK.
The NV_PCK includes highlight color information, button information for each individual button, and the like. The highlight color information includes color palette information, which specifies the semitransparent highlight color to be overlay-displayed. The button information for each individual button includes rectangular area information showing a position of the button, highlight movement information about movements from the button to other buttons (designating move destination buttons corresponding to the user's operations of the Up/Down/Right/Left keys), and button command information (a command to be executed when the selection of the button is determined).
The highlight on the menu is generated as an overlay image, as shown in the upper middle right part of FIG. 2. This overlay image is obtained by giving the color specified by the color palette information to the area shown by the rectangular area information of the button information. The overlay image is superimposed on a background image shown in the right part of FIG. 2, and the resulting image is displayed on the screen.
In this way, menus in DVDs are realized. A main reason why part of navigation data is embedded in the stream using the NV_PCK is to allow menu information to be dynamically updated in synchronization with the stream (e.g. displaying a menu only for five to ten minutes during movie playback), so that even an application that has a difficult synchronization timing can be appropriately realized. Another main reason is to improve user operability by, for example, storing information for supporting special playback in the NV_PCK so as to enable AV data to be smoothly decoded and played even when a DVD is played in a special mode such as fast-forward or rewind.
FIG. 3 is a conceptual view showing a VOB that is a stream on the DVD. As shown in FIG. 3, data such as video, audio, and subtitles (shown in the level A in FIG. 3) are packetized and packed (shown in the level B in FIG. 3) and multiplexed with each other to form one MPEG program stream (shown in the level C in FIG. 3), based on the MPEG system standard (International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 13818-1). A NV_PCK that carries a button command for realizing interactivity is multiplexed in the MPEG program stream too, as described above.
In the MPEG system, multiplexing has the following characteristic. While individual data to be multiplexed, i.e., video data, audio data, or subtitle data, is arranged in a bit string based on a decoding order, these different types of data on the whole, i.e., video data, audio data, and subtitle data altogether, are not necessarily arranged in a bit string based on a playback order. This is because a decoder model for a multiplexed MPEG system stream (generally called a System Target Decoder or an STD (shown in the level D in FIG. 3)) has decoder buffers corresponding to individual elementary streams obtained by demultiplexing, and temporarily stores demultiplexed data in the corresponding decoder buffers until decoding. For instance, decoder buffers defined by DVD-Video have different sizes depending on individual elementary streams, such that a buffer size is 232 KB for video, 4 KB for audio, and 52 KB for subtitles.
In other words, subtitle data multiplexed adjacent to video data is not necessarily decoded and played at a same timing as the video data.
There is also a next-generation DVD standard called a blu-ray disc (BD).
While DVDs are intended for package distribution of video (DVD-Video format) of a standard image quality (Standard Definition quality) and recording of analog broadcasting (DVD Video Recording format), BDs are capable of recording digital broadcasting of a high definition image quality (High Definition quality) as it is (Blu-ray Disc Rewritable format, hereafter referred to as BD-RE).
However, since BD-RE is widely intended for recording of digital broadcasting, special playback supporting information and the like have not been optimized. For future package distribution of high-resolution video at a higher rate than digital broadcasting (BD-ROM format), a mechanism that does not cause any stress on the user even during special playback is needed.
MPEG-4 Advanced Video Coding (AVC) has been employed as one of the moving image coding methods in BDs. MPEG-4 AVC is a next-generation coding standard with a high compression rate, which was jointly developed by ISO/IEC JTC1/SC29/WG11 and International Telecommunication Union—Telecommunication Standardization Sector (ITU-T).
In general, when coding a moving image, information is compressed by reducing redundancies in a temporal direction and a spatial direction. In inter-picture prediction coding that aims to reduce temporal redundancies, motion detection and generation of a predictive image are performed in units of blocks with reference to a picture which precedes and/or follows a coding target picture, and a difference between the coding target picture and the generated predictive image is coded. The term “picture” used here denotes an image of one screen. In detail, a picture denotes a frame in a progressive format, and a frame or a field in an interlaced format. In the interlaced format, one frame is made up of two fields of different times. An interlaced image can be coded and decoded by processing one frame as the frame itself, processing one frame as two fields, or processing each block of a frame as a frame structure or a field structure.
An I picture is an intra-picture prediction coded picture that has no reference image. A P picture is an inter-picture prediction coded picture that references only one picture. A B picture is an inter-picture prediction coded picture that references two pictures simultaneously. A B picture can reference any combination of two pictures that precede and/or follow the B picture in terms of display time. A reference image (reference picture) can be designated for each block that is a basic unit of coding/decoding. Here, a reference picture that is written first in a coded bit stream and a reference picture that is written later in the coded bit stream are distinguished from each other as a first reference picture and a second reference picture, respectively. Note here that, to code/decode a P picture or a B picture, its reference picture needs to have been already coded/decoded.
A residual signal that is obtained by subtracting a predictive signal generated by intra-picture prediction or inter-picture prediction from a coding target image is frequency-transformed and quantized, and then variable length coded and outputted as a coded stream. MPEG-4 AVC has two variable length coding methods, namely, context-adaptive variable length coding (CAVLC) and context-adaptive binary arithmetic coding (CABAC), that can be switched in units of pictures. The context adaptive referred to here denotes a mode of adaptively selecting an efficient coding method in accordance with circumstances.
The following describes a stream in which different coding methods (or a moving image with different attributes) can exist, and a decoding process by a decoding device that receives such a stream. Two examples are used in this specification. A first example is a case where different variable length coding methods (CAVLC/CABAC) can exist. A second example is a case where a luminance level threshold which is used when performing a transparency process by a luminance key on a picture-in-picture can take different values.
Firstly, the first example of stream in which different coding methods (or a moving image with different attributes) can exist, that is, the example where different variable length coding methods (CAVLC/CABAC) can exist, is described below. FIG. 4 shows an example of variable length coding that is applied to pictures which constitute a randomly accessible unit in an MPEG-4 AVC stream. In MPEG-4 AVC, there is no concept corresponding to a group of pictures (GOP) of MPEG-2 Video. However, by dividing data by a special unit of pictures that is able to be decoded independently of other pictures, a randomly accessible unit corresponding to a GOP can be obtained. Such a randomly accessible unit is hereafter called a random access unit (RAU). As shown in FIG. 4, whether CABAC or CAVLC is used for variable length coding is determined on a picture basis.
A variable length decoding process is different between CABAC and CAVLC. The variable length decoding process for each of CABAC and CAVLC is described below, with reference to FIGS. 5A to 5C. FIG. 5A is a block diagram of an image decoding device that performs context-adaptive binary arithmetic decoding (CABAD) as a process of decoding data which is variable length coded according to CABAC, and context-adaptive variable length decoding (CAVLD) as a process of decoding data which is variable length coded according to CAVLC.
An image decoding process according to CABAD is carried out in the following manner. Firstly, coded data Vin generated according to CABAC is inputted to a stream buffer 5001. An arithmetic decoding unit 5002 reads coded data Vr from the stream buffer 5001, performs arithmetic decoding on coded data Vr, and inputs binary data Bin1 to a binary data buffer 5003. A binary data decoding unit 5004 acquires binary data Bin2 from the binary data buffer 5003, decodes binary data Bin2, and inputs binary data Din1 obtained as a result of the decoding to a pixel reconstruction unit 5005. The pixel reconstruction unit 5005 performs processes such as inverse quantization, inverse transformation, and motion compensation on binary decoded data Din1 to reconstruct pixels, and outputs decoded data Vout. FIG. 5B is a flowchart showing an operation from the start of decoding CABAC coded data to the pixel reconstruction process. Firstly, in Step S5001, CABAC coded data Vin is arithmetic-decoded to generate binary data. Next, in Step S5002, a judgment is performed as to whether or not a predetermined unit of binary data, such as one or more pictures, has been obtained. When the predetermined unit of binary data has been obtained, the operation proceeds to Step S5003. When the predetermined unit of binary data has not been obtained, Step S5001 is repeated. This binary data buffering is performed for the following reason. In CABAC, an amount of code of binary data per picture or per macroblock may increase greatly, which can cause a significant increase in processing load for arithmetic decoding. Therefore, to realize seamless playback even in a worst case, it is necessary to conduct a certain amount of arithmetic decoding in advance. In Step S5003, the binary data is decoded. In Step S5004, the pixel reconstruction process is performed on the decoded binary data. Thus, in CABAD, the pixel reconstruction process cannot be started until the predetermined unit of binary data is obtained in Steps S5001 and S5002. This causes a delay in decoding start.
An image decoding process according to CAVLD is carried out in the following manner. Firstly, coded data Vin generated according to CAVLC is inputted to the stream buffer 5001. Next, a CAVLD unit 5006 performs variable length decoding on coded data Vr, and outputs VLD decoded data Din2 to the pixel reconstruction unit 5005. The pixel reconstruction unit 5005 performs processes such as inverse quantization, inverse transformation, and motion compensation on VLD decoded data Din2 to reconstruct pixels, and outputs decoded data Vout. FIG. 5C is a flowchart showing an operation from the start of decoding CAVLC coded data to the pixel reconstruction process. Firstly, in Step S5005, CAVLD is performed. Next, in Step S5004, the pixel reconstruction process is performed. Thus, CAVLD differs from CABAD in that there is no need to wait until the predetermined unit of data is obtained before starting the pixel reconstruction process, and there is no need to have an intermediate buffer in the variable length decoding process such as the binary data buffer 5003.
FIG. 6 is a flowchart showing an operation of a conventional decoding device that decodes a stream in which the variable length coding method is switched as in the example of FIG. 4. It should be noted that, in this specification, a decoding device and a decoding method are examples of a moving image playback device and a moving image playback method, respectively.
Firstly, in Step S5101, the decoding device acquires information showing a variable length coding method applied to a picture, and proceeds to Step S5102. In Step S5102, the decoding device judges whether or not the variable length coding method of the current picture is different from a variable length coding method of a picture immediately preceding the current picture in a decoding order. Since CABAD and CAVLD use different buffer management methods in variable length decoding, when the variable length coding method is different, the decoding device proceeds to Step S5103 to switch the buffer management. When the variable length coding method is not different, the decoding device proceeds to Step S5104. In Step S5104, the decoding device judges whether or not the variable length coding method of the current picture is CAVLC. When the variable length coding method of the current picture is CAVLC, the decoding device proceeds to Step S5105 to perform CAVLD. When the variable length coding method of the current picture is CABAC, the decoding device proceeds to Step S5106. In Step S5106, the decoding device judges whether or not the variable length coding method of the current picture is different from the variable length coding method of the immediately preceding picture in the decoding order. When the variable length coding method of the current picture is different, the decoding device proceeds to Step S5107. In Step S5107, the decoding device performs arithmetic decoding until the predetermined unit of binary data is obtained as shown in Steps S5001 and S5002 in FIG. 5, and then decodes the binary data. When the variable length coding method of the current picture is not different in Step S5106, the decoding device proceeds to Step S5108 to perform a normal CABAD process. The normal CABAD process mentioned here is a CABAD process that omits the binary data buffering which is necessary when CAVLC is switched to CABAC or when decoding of a CABAC coded stream begins. Lastly, in Step S5109, the decoding device performs the pixel reconstruction process.
The second example of stream in which different coding methods (or a moving image with different attributes) can exist, that is, the example where a luminance level threshold (i.e. a moving image attribute) which is used when performing a transparency process by a luminance key on a picture-in-picture can take different values, is described next. Package media such as a BD-ROM provide an application for displaying video other than main video, e.g. the director's commentary video, by overlaying it on the main video. Such an application is referred to as picture-in-picture. FIG. 7 is a diagram for explaining this picture-in-picture. FIG. 7(a) shows image display planes, where plane 2 is to be overlaid on plane 1. FIGS. 7(b) and 7(c) show images displayed on plane 1 and plane 2, respectively. The display image of plane 2 is overlay-displayed on the display image of plane 1 (FIG. 7(d)). In such a picture-in-picture, the image displayed on plane 1 is the main video, and the image displayed on plane 2 is the video other than the main video. Simply overlaying the display video of plane 2 on plane 1 causes the image of plane 1 to be completely hidden. To prevent this, a transparency process by a luminance key is applied to the image of plane 2. The following describes the transparency process by the luminance key. In the transparency process, each pixel in an image is displayed transparent or nontransparent depending on a luminance level of the pixel. In detail, the transparency process is as follows.
(1) When the luminance level is in a range of 0 to predetermined threshold YL inclusive, the pixel is displayed completely transparent (with a transparency rate of 1). 
(2) When the luminance level exceeds predetermined threshold YL, the pixel is displayed nontransparent, at the luminance level (with a transparency rate of 0).
In FIG. 7(c), suppose the luminance level is equal to or smaller than predetermined threshold YL in the diagonally shaded area and exceeds predetermined threshold YL in the other area. In such a case, when overlaying the image of plane 2 on plane 1, the diagonally shaded area is displayed transparent while the other area is displayed nontransparent, as shown in FIG. 7(d). Which is to say, in the image of plane 2, only an area (pixel) whose luminance level exceeds threshold YL is overlay-displayed on the image of plane 1. Thus, plane 2 is separated into a foreground and a background according to the luminance level threshold and only the foreground is overlay-displayed on plane 1, thereby realizing a picture-in-picture.
Patent Reference 1: Japanese Unexamined Patent Application No. 2000-228656 Publication
Non Patent Reference 1: Proposed SMPTE Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process, Final Committee Draft 1 Revision 6, 2005.7.13.