On a playback-only DVD (Digital Video Disk), coded data which is obtained by performing compressive coding on an audio video signal corresponding to a specific program or the like, is recorded. This audio video signal includes an audio signal and a video signal, and coded audio data obtained by coding the audio signal and coded video data obtained by coding the video signal are recorded on the DVD as the above-described coded data. Further, the coded audio data and the coded video data are respectively packed. That is, these coded data are divided into plural pieces of data corresponding to first data units each having a predetermined data size (e.g., 2048 bytes).
In the following description, the coded audio data corresponding to the first data units are referred to as audio packs, and the coded video data corresponding to the first data units are referred to as video packs.
These audio packs and video packs are multiplexed and recorded on the DVD.
Furthermore, the coded data recorded on the DVD are divided into plural pieces of data corresponding to second data units each including plural pieces of the first data units, and the coded data are managed in the second data units. The coded data corresponding to the second data units are referred to as video objects (VOB).
For example, the coded data corresponding to one program is composed of at least one VOB. In the video standard relating to DVDROM, a group comprising at least one VOB is called a video object set (VOBS), and it is recorded on the DVD as one title.
Furthermore, on the DVD, together with the coded data corresponding to each title (program), the following information is recorded as information for managing the title (program) information indicating the position in the recording area of the DVD where the coded data corresponding to this title is recorded (recording position information, video attribute information corresponding to the coded data, and audio attribute information corresponding to the coded data. Each information is recorded in VOB units, as mentioned above.
The recording position information is various kinds of address information in the recording area of the DVD, for example, a header address and an end address of the area where the coded data of the VOBs corresponding to the title is recorded, or an address indicating the search point which has previously been set by the user.
The video attribute information relates to the compressive coding mode, TV system, aspect ratio, display mode, etc.
There are two types of compressive coding modes for DVD: a mode based on the MPEG-1 coding, and a mode based on the MPEG-2 coding. The compressive coding mode information indicates that the coded data of each VOB corresponds to either the former mode or the latter mode.
There are two types of TV systems: a system corresponding to the NTSC system (number of lines: 525, frame frequency: 59.97 Hz), and a system corresponding to the PAL system (number of lines: 625, frame frequency: 50 Hz). The TV system information indicates that the coded data of each VOB corresponds to either the former system or the latter system.
The aspect ratio is the ratio of the image size in the horizontal direction to the image size in the vertical direction, and two ratios, 4:3 and 16:3, are practically adopted. The aspect ratio information indicates that the coded data of each VOB corresponds to either 4:3 or 16:9.
Further, the display mode is a method of image display based on the video signal obtained from the coded data. For example, it is the pan & scan display mode or the letter box display mode. The display mode information indicates a display mode by which the video signal obtained from the coded data of each VOB is to be displayed.
In the pan & scan display mode, a wide image having an aspect ratio of 16:9 is displayed on a standard screen having an aspect ratio of 4:3, by removing left and right sides of the wide image. In the letter box display mode, a wide image having an aspect ratio of 16:9 is displayed on a standard screen having an aspect ratio of 4:3, by adding regions of a predetermined color to top and bottom sides of the wide image.
Meanwhile, there is the MPEG coding as an international standard of a compressive coding method for a video signal (hereinafter also referred to as image data). In the MPEG coding, the process of coding the image data is adaptively switched between intra-frame coding in which the image data is coded using a correlation of pixel vales in one frame, and inter-frame coding in which the image data is coded using a correlation of pixel values between frames. In the MPEG coding, the coded data corresponding to continuous plural frames are regarded as one unit, and the image comprising the continuous plural frames is called a group of pictures (GOP).
To be specific, in the MPEG coding, the image data of at least one frame among the plural frames constituting the GOP is subjected to the intra-frame coding while the image data of the remaining frames are subjected to the inter-frame coding.
There are two types of inter-frame coding: forward direction inter-frame predictive coding, and bi-directional inter-frame predictive coding. A frame to be subjected to the forward inter-frame predictive coding is called a P frame, and a frame to be subjected to the bi-directional inter-frame predictive coding is called a B frame. The image data of the P frame is subjected to the predictive coding with reference to the image data of a frame (reference frame) positioned before the P frame. The image data of the B frame 's subjected to the predictive coding with reference to the image data of two frames (reference frames) which are positioned close to and before and after the B frame. Usually, when coding a P frame, an I frame close to the P frame is used as a reference frame. When coding a B frame, an I frame and a P frame (or two P frames) which are close to the B frame are used as reference frames.
FIG. 17 is a diagram for explaining an example of the structure of the GOP, wherein continuous plural frames F(k−5) F(k+12) are associated with coded data D(k−5)˜(k+12) corresponding to the respective frames. In FIG. 17, k is an arbitrary integer.
One GOP is composed of twelve frames from the B frame F(k−2) to the P frame F(k+9). For example, the P frame F(k+3) is subjected to the inter-frame predictive coding with reference to the I frame F(k). Further, the P frame F(k+6) is subjected to the inter-frame predictive coding with reference to the P frame F(k+3). Further, the B frames F(k+1) and F(k+2) are subjected to the inter-frame predictive coding with reference to the I frame F(k) and the P frame F(k+3).
The coded data corresponding to the respective frames, which are obtained in the above-described coding process, are subjected the process of changing the arrangement of the coded data from the arrangement according to the order of displaying the images of the respective frames to the arrangement according to the order of decoding the respective frames, thereby reducing the capacity of a memory used for decoding (rearrangement process), To be specific, as shown in FIG. 17, in the arrangement obtained by subjecting the coded data corresponding to the GOP to the above-mentioned rearrangement process, the coded data D(k) of the I frame F(k) is positioned at the head of the GOP, followed by the coded data D(k−2) of the B frame F(k−2), the coded data D(k−1) of the B frame F(k−1), and the coded data D (k+3) of the P frame F(k+3).
Then, the coded data corresponding to the GOP is recorded on a recording medium or transmitted through a transmission medium, according to the order after the rearrangement process.
By the way, header information of a video stream (coded data obtained by coding a video signal) based on the MPEG standard includes, as video resolution information, information relating to the horizontal and vertical image sizes, frame frequency, and aspect ratio. Further, the header information includes information for recognizing that the video stream corresponds to either an interlace signal or a progressive signal.
In the standard relating to DVD, coded data which corresponds to at least one GOP and is equivalent to a display time longer than 0.4 sec. and shorter than 1.0 sec. is defined as a video object unit (VOBU) (third data unit), and a VOB is composed of a plurality of VOBU.
Each VOBU includes a plurality of packs (first data units), and the head position of the VOBU matches the head position of the packs. Further, at the head of the VOBU, a pack called a navigation pack, including information such as playback control information (PCI) and data search information (DSI), is placed.
In the field of television broadcasting, CS (Communication Satellite) broadcasting takes the lead in digitization, and digital broadcasting of a high-vision TV signal will be started subsequent to digital broadcasting of a standard TV signal. Accordingly, it is supposed that a standard TV signal and a high-vision TV signal will coexist in one broadcast sequence, or an interlace signal and a progressive signal will coexist in one broadcast sequence. In this case, the video resolutions of the TV signals which are broadcast in the same broadcast sequence will change with a change of a program.
In such digital TV broadcasting, a video stream and an audio stream are multiplexed according to the MPEG standard to be transmitted as a transport stream.
On the other hand, when recording the video stream and the audio stream on a DVD, the transport stream including the video stream and the audio stream is converted to a program stream to make the DVD have trick play functions, and this program stream is recorded on the DVD. Accordingly, when the audio video stream obtained from the received digital TV broadcast signal is recorded on the DVD, the audio video stream must be converted from the transport stream to the program stream (TS/PS conversion), and techniques relating to such stream conversion have already been developed. For example, Japanese Published Patent, Application No. Hei. 11-45512 (HITACHI) discloses a technique of TS/PS conversion, i.e., a technique for converting a transport stream to a program stream.
Hereinafter, a description will be given of the standard for recording the program stream on a recording tedium such as an optical disk.
FIG. 18 is a diagram for explaining the format of recorded data 10 based on the recording standard, and illustrates the specific contents of video attribute information (V_ATR) 10d1.
The recorded data 10 as data recorded on a recording medium by a recorder based on the above-described recording standard, and this recorded data 10 is composed of a video manager (VMG) 10a, and three video objects: a video object (VOB(1)) 10a1, a video object (VOB(2)) 10a2, and a video object (VOB(3)) 10a3. Each of the VOB(1) 10a1 VOB(3) 10a3 includes an audio video stream, and the VMG 10a includes management information for each VOB.
The recorded data 10 corresponds to one TV broadcast program. Further, the recorded data 10 is divided into, three VOBs according to the above-mentioned recording standard because the user performed two times of pause operations while recording the audio video stream of this program. That is, the boundary between the VOB(1) 10a1 and the VOB(2) 10a2 corresponds to the first pause position, and the boundary between the VOB(2) 10a2 and the VOB(3) 10a3 corresponds to the second pause position. In other words, in the recording process based on the DVD recording standard, when recording of the audio video stream is paused, the streams before and after the pause position are recorded as different VOBs on the recording medium.
As described above, the VMG 10a is management information for each VOB recorded, and the VMG 10a is composed of video manager information (VMGI) 10b1 and an audio video file information table (AVFIT) 10b2. The VMGI 10b1 includes, as search information, information relating to the time when each VOB bras recorded on the recording medium recording time), and the address of the recording area in the recording medium corresponding to each VOB (recording address).
Further, the AVFIT 10b2 includes audio video file table information (AVFITI) 10c, and plural pieces of video object stream information (VOB_STI) as many as the number of the recorded VOBs, i.e., VOB_STI(1) 10c1, VOB_STI(2) 10c2, and VOB_STI(3) 10c3. The AVFITI 10c includes information about the number of the recorded VOBs, and the like. Each of the VOB_STI 10c˜10c3 includes the attribute information of the corresponding VOB. For example, the VOB_STI(1) 10c1 is composed of video attribute information (V_ATR) 10d1 and audio attribute information (A_ATR) 10d2.
Hereinafter, the video attribute information (V_ATR) 10d1 will be described in detail.
The V—_ATR 10d1 includes compression mode information 10e1, horizontal video resolution (H_video resolution) information 10e2, vertical video resolution (V_video resolution) information 10e3, frame frequency information 10e4, TV system information 10e5, and aspect ratio information 10e6.
The compression mode information 10e1 is information for recognizing that the video stream of each vow is based on either the MPEG-1 coding or the MPEG-2 coding.
In the horizontal video resolution information 10e2, information for identifying the frame size in the horizontal direction corresponding to each VOB is described. To be specific, as the number of pixels in the horizontal direction, any of the following values is described: 352, 480, 544, 704, 720, 1440, 1920.
In the vertical video resolution information 10e3, information for identifying the frame size in the vertical direction corresponding to each VOB is described. To be specific, as the number of scanning lines, any of the following values is described: 240, 480, 576, 720, 1080.
The frame frequency information 10e4 is information for identifying the frame frequency corresponding to each VOB. For example, it shows any of the following frequencies: 24 Hz, 29.97 Hz, 30 Hz, 25 Hz, 50 Hz, 60 Hz.
The TV system information 10e5 is information for identifying that the video signal corresponding to each VOB is either an interlace signal or a progressive signal.
The aspect ratio information 10e5 is information for identifying the aspect ratio of the video signal corresponding to each VOB. For example, it shows the value of the aspect ratio (4:3 or 16:9), or the type of the letter box.
While the V_ATR 10d1 shown in FIG. 18 includes the compression mode information 10e1, the horizontal video resolution information 10e2, the vertical video resolution information 10e3, the frame frequency information 10e4, the TV system information 10e5, and the aspect ratio information 10e6, the V_ATR 10d1 may include caption data information (Line21_switch) 10e7 in addition to the information 10e1˜10e6 as shown in FIG. 19. The caption data information 10e7 is information for identifying whether each of the video signals in the first and second fields includes Line21 data or not. The Line21 data is closed caption data which is superposed on a portion of the video signal corresponding to the 21st line.
Next, the audio attribute information (A_ATR) 10d2 will be described in detail.
FIG. 20 is a diagram for explaining the format of the above-described recorded data, and illustrates the specific contents of the audio attribute information (A_ATR) 10d2.
The A_ATR 10d2 includes, as information for identifying the attribute of the audio signal corresponding to each VOB, coding mode information 10f1, quantization information 10f2, dynamic range control (DRC) information 10f3, sampling frequency (fs) information 10f4, number-of-audio-channels information 10t5, and audio bit rate information 10f6.
The coding mode information 10f1 is information for identifying the type of the audio stream corresponding to each VOB. For example, it shows that the audio stream corresponds to any of the following modes: Dolby AC3, MPEG-1, MPEG-2, and Linear PCM (Pulse Code Modulation).
The quantization information 10f2 is information for identifying the number of quantized bits (16 bits, 20 bits, 24 bits, etc.) in the case where the audio stream corresponding to each VOB is subjected to Linear PCM.
The dynamic range control information 10f3 is information for identifying whether or riot the audio stream corresponding to each VOB includes dynamic range control data in the MPEG-1 or MPEG-2 coding.
The sampling frequency information 10f4 is information for identifying the sampling frequency (48 kHz, 96 kHz, etc.) of the audio stream corresponding to each VOB.
The number-of-audio-charnels information 10f5 is information for identifying the number of channels (1ch(mono), 2ch(stereo), 2ch(dual mono), 3ch, 4ch, 5ch, 6ch, 7ch, 8ch, etc.) of the played audio signal obtained from the audio stream corresponding to each VOB.
The audio bit rate information 10f6 is information for identifying the bit rate (64 kbps, 89 kbps, 112 kbps, 126 kbps, 160 kbps, 192 kbps, 224 kbps, 256 kbps, 320 kbps, 384 kbps, 448 kbps, 768 kbps, 1536 kbps, etc.) of the audio stream corresponding to each VOB.
It is possible to recognize the recording time of the audio video stream corresponding to each VOB, the recording address thereof, and the attribute information of each VOB, by reading the VMG 10a from the data 10 recorded on the recording medium (optical disk).
Next, the structure of the VOB in the recorded data 10 will be described with reference to FIGS. 21(a), 21(b), 22(a) and 22(b). FIGS. 21(a) and 22(a) show the detailed structure of the VOB(1) 10a1 in the recorded data 10.
The VOB(1) 10a1 is composed of plural video object units, i.e., VOBU(1) 10g1, VOBU(2) 10g2, VOBU(3) 10g13, VOBU(4) 10g4, VOBU(n) 10gn. 
One VOBU includes an audio video stream which corresponds to at least one GOP and is equivalent to a display time of 0.4˜1.0 sec. For example, the VOBU(1) 10g1 is composed of plural video packs (V—_PCK) 10h(1), 10h(2), 10h(3), . . . , 10h(r), and plural audio packs (A_PCK) 10i(1), 10i(2), . . . , 10i(s). Each of the video packets and audio packets has a predetermined data size, and the data size is 2048 bytes in this case
FIG. 21(b) shows the video packs 10h(1)˜10h(8) associated with the streams of the respective frames constituting the GOP.
The VOBU(1) 10g1 includes a video stream corresponding to one GOP. To be specific, the video stream included in the VOBU(1) 10g1 is composed of I frame coded data Dv1, B frame coded data Dv2 and Dv3, P frame coded data Dv4, B frame coded data Dv5 and Dv6, P frame coded data Dv7, B frame coded data Dv8 and Dv9, and padding data Dpud.
Since each VOBU is composed of the video packs and audio packs each having 2048 bytes, the data size of the VOBU should be an integer multiple of 2048 bytes. So, the padding data Dpud is added to the video stream corresponding to one GCP to make the data size of the video stream included in the VOBJ equal to an integer multiple of 2048 bytes.
FIG. 22(b) shows the audio packs 10i(1)˜10i(4) constituting the VOBU(I) 10g1, associated with the streams of the respective audio frames.
The VOBU(1) 10a1 includes an audio stream corresponding to one GOP. To be specific, the audio stream included in the VOBU(1) 10g1 is composed of coded data of the audio frames Da1 Da8 and the padding data Dpud1˜Dpud4 corresponding to the respective audio packs (A_PCK) 10i(1)˜10i(4). That is, the A_PCK 10i(1) includes the coded data of the audio frames Da1 and Da2 and the padding data Dpud1, the A_PCK 10i(2) includes the coded data of the audio frames Da3 and Da4 and the padding data Dpud2, the A_PCK 10i(3) includes the coded data of the audio frames Da5 and Da6 and the padding data Dpud3, and the A_PCK 10i(4) includes the coded data of the audio frames Da7 and Da8 and the padding data Dpud4.
As described above, the audio stream of one A_PCK 10i(s) is composed of the coded data of two audio frames and the padding data to make the data size of the audio stream included in the VOBU equal to an integer multiple of 2048 bytes.
While in FIGS. 21(b) and 22(b) only the structures of the VOB 10a1 and VOBU(1) 10g1 are described, the VOB 10a2 and VOB 10a3 and the VOBU(2) 10g2 VOBU(n) 10gn also have the same structures as those of the VOB 10a1 and VOBU(1) 10g1, respectively.
As described above, since the VOBU has the data structure which is divided into data units each having a predetermined data size (2048 bytes), address management for the VOBU is simplified, and access to data in VOBU units on the recording medium is facilitated.
In the conventional recording apparatus however, the audio video stream is usually recorded as one VOB, and management of its attribute is performed for the whole stream in the lump. This results in various drawbacks as follows.
When the conventional recording apparatus records the audio video stream, the stream inputted to the recording apparatus from when the recording is started to when the recording is ended, is recorded as one VOB on the recording medium. However, when the recording operation is paused, the stream inputted before the pause and the stream inputted after the pause are recorded as different VOBs on the recording medium. Further, there are other causes by which the audio video stream is recorded as plural VOBs.
In other words, in the conventional recording apparatus, the audio video stream corresponding to one program is usually recorded as one VOB on the recording medium. When the recording of the stream is paused to prevent a portion of the stream corresponding to a CM (commercial) or the like from being recorded, the audio video stream corresponding to one program is recorded as different VOBs.
By the way, since a digital broadcast signal is usually inputted to the recording apparatus as an MPEG stream (audio video stream corresponding to the MPEG standard), when the digital broadcast signal is received and recorded, the video attribute (e.g., video resolution) changes from scene to scene during the recording, in contract with recording of an analog broadcast signal. For example, the resolution in the horizontal direction changes from 720 pixels to 312 pixels.
Further, when a standard TV broadcast program and a high-vision TV broadcast program are recorded continuously, the aspect ratio, which is an attribute of the recorded data corresponding to each program, changes from 4:3 to 16:9 at the point where the program changes.
In this case, two audio video streams having different attributes are recorded as one VOB, whereby management of the video attribute of the stream recorded on the recording medium becomes difficult.
Further, in the process of playing the audio video stream recorded or the recording medium, since the video stream (coded data) is decoded with the video resolution as a parameter, management of the video resolution as the video attribute is insufficient. If the change of the video re olution is not posted to the decoder, decoding of the video stream results in failure.
Furthermore, since the recording position of the audio video stream is managed by the address corresponding to each VOBU which is a constituent of the VOB, direct access cannot be made to the position where the video resolution changes, in the stream included in the VOBU. Therefore, random access to the beginning of the program, based on the change of the video resolution, cannot be performed at high speed.
Moreover, although the coded data included in the VOBU is divided into data units (packs) each having a predetermined data size (2048 bytes) and each pack is managed by the address indicating the recording position of its head data, the position corresponding to the video resolution change point in the stream is not always equal to the head position of the pack as the access unit and, therefore, the video resolution change position in the stream cannot accessed by using the address indicating the recording position of the pack.
Furthermore, when the video signal is coded by the MPEG-2 coding, the coding mode for each frame is decided based on a predetermined rule, and coding is performed so that a GOP comprising at least one I frame, plural P frames, and plural B frames is constituted according to the decided coding mode. Therefore, when the video resolution changes, the video resolution may vary between the frames constituting one GOP, and a stream adapted to the MPEG-2 standard cannot be generated.
Furthermore, in the MPEG coding, the inter-frame predictive coding is carried out, that is, the video signal of the target frame to be processed is coded with reference to the video signal of the reference frame which has already been coded. Therefore, when the video resolution changes, the video resolution varies between the target frame and the reference frame, and the inter-frame predictive coding results in failure.
Also when recording the coded data obtained by coding the TV signal whose aspect ratio changes, the following drawbacks will occur like in the above-described case where the video resolution changes.
That is, when the video aspect ratio changes, management of the video attribute of the recorded audio video stream (coded data) becomes difficult. Further, in the playback process, decoding of the coded data results in failure due to the change of the video aspect ratio. Moreover, quick access cannot be made to the position where the video aspect ratio changes, in the video stream.
Moreover, in the MPEG coding, when the video aspect ratio changes, a stream adapted to the MPEG-2 standard cannot be generated, or the inter-frame predictive coding results in failure, as in the above-described case where the video resolution changes.
With respect to a video signal having plural video attributes not only the video resolution and the video aspect ratio but also the coding mode and the like, access cannot be made to the position where the video attribute changes, in the recorded video stream.
Further, w-th respect to an audio video signal including not only a video signal but also an audio signal appended to the video signal, access cannot be made to the position where the audio attribute (coding mode, number of channels, bit rate, etc.) of the audio signal changes, in the recorded data corresponding to this audio video signal.