A method for editing a multiplexed stream of video and audio streams with video frame accuracy and seamlessly reproducing edit points is described in Jpn. Pat. Appln. Laid-Open Publication No. 2000-175152, Jpn. Pat. Appln. Laid-Open Publication No. 2001-544118, and Jpn. Pat. Appln. Laid-Open Publication No. 2002-158974.
FIG. 1 is a block diagram showing a conventional DVR-STD model (DVR MPEG2 transport stream player model) (hereinafter, referred to as “player”) 101. The DVR-STD is a conceptual model for modeling decode processing in generating and examining an AV stream that is referred to by seamlessly connected two PlayItems.
As shown in FIG. 1, in the player 101, a TS (Transport Stream) file read out from a readout section (DVR drive) 111 at a bit rate RUD is buffered in a read buffer 112. From the read buffer 112, a source packet is read out to a source depacketizer 113 at a maximum bit rate RMAX.
A pulse oscillator (27 MHz X-tal) 114 generates a 27 MHz pulse. An arrival time clock counter 115 is a binary counter that counts the 27 MHz frequency pulse and supplies the source depacketizer 113 with a count value Arrival_time_clock(i) of the arrival time clock counter at a time t(i).
One source packet has one transport packet and its arrival_time_stamp. When the arrival_time_stamp of the current source packet is equal to the value of LSB (Least Significant Bit: 30 bit) of the arrival_time_clock(i), a transport packet of the current source packet is output from the source depacketizer 113. TS_recording_rate is a bit rate of a transport stream (hereinafter referred to as “TS”). Notations of n, TBn, MBn, EBn, TBsys, Bsys, Rxn, Rbxn, Rxsys, Dn, Dsys, On, and Pn (k) shown in FIG. 1 are the same as those defined in T-STD transport stream system target decoder specified by ISO/IEC 13818-1) of SO/IEC 13818-1 (MPEG2 systems specification).
Decoding process in the above conventional player 101 will next be described. Firstly, the decoding process during reproduction of a single DVR MPEG2 TS will be described. During reproduction of a single DVR MPEG2 TS, the timing at which a transport packet is output from an output section 110 so as to be input to TB1, TBn or TBsys of the DVR-STD, which is a decoder 120, is determined by arrival_time_stamp of the source packet. Specification related to buffering operations of TB1, MB1, EB1, TBn, Bn, TBsys and TBsys is the same as in the case of the T-STD specified by ISO/IEC 13818-1. Specification related to decoding and presentation operations is also the same as in the case of the T-STD specified by ISO/IEC 13818-1.
Next, decoding process during reproduction of seamlessly connected PlayItems will be described. Here, reproduction of a previous stream TS1 and a current stream TS2 that are referred to by the seamlessly connected PlayItems will be described.
During the shift between a certain AV stream (TS1) and the next AV stream (TS2) seamlessly connected to the AV stream (TS1), the time axis of TS2 arrival time base is not the same as that of TS1 arrival time base. Further, the time axis of TS2 system time base is not the same as that of TS1 system time base. The presentation of video images needs to be continued seamlessly. An overlap may exist in the presentation time of audio presentation unit.
Next, an input timing of the transport packet read out from the source depacketizer to the DVR-STD will be described.
(1) Before time T1 at which the input of the last video packet of TS1 to TB1 of the DVR-STD has been completed
Before time T1, input timing to buffer TB1, TBn or TBsys of the DVR-STD is determined by arrival_time_stamp of the source packet of TS1.
(2) From time T1 to time T2 at which the input of the last byte of remaining packets of TS1 has been completed
Remaining packets of TS1 must be input to the buffer TBn or TBsys of the DVR-STD at a bit rate (maximum bit rate of TS1) of TS_recording_rate (TS1). TS_recording_rate(TS1) is a value of TS_recording_rate defined by ClipInfo ( ) corresponding to Clip 1. The time at which the last byte of TS1 is input to the buffer is time T2. Therefore, from time T1 to time T2, arrival_time_stamp of the source packet is ignored.
Assuming that N1 is the number of bytes of the transport packet of TS1 that follows the last video packet of TS1, the time between T1 and T2 (time T2−1=T2−T1) is the time required to complete the input of N1 byte at a bit rate of TS_recording_rate(TS1), and is represented by the following equation (1).T2−1=T2−T1=N1/TS_recording_rate(TS1)  (1)
From time T1 to time T2, values of Rxn and Rxsys shown in FIG. 1 are changed to the value of TS_recording_rate(TS1). Except for the above rule, buffering operation is the same as that of the T-STD.
Since values of Rxn and Rxsys shown in FIG. 1 are changed to the value of TS_recording_rate(TS1) between time T1 and T2, additional buffer amount (data amount corresponding to about 1 second) is required in addition to the buffer amount defined by the T-STD so that an audio decoder can process the input data between time T1 and T2.
(3) After Time T2
At time T2, the arrival time clock counter 115 is reset to the value of arrival_time_stamp of the first source packet of TS2. The input timing to the buffer TB1, TBn or TBsys of the DVR-STD is determined by arrival_time_stamp of the source packet of TS2. Rxn and Rxsys are changed to the value defined by T-STD.
Next, video presentation timing will be described. A video presentation unit must be presented seamlessly through its connection point.
Here, it is assumed that    STC (System Time Clock) 1: time axis of TS1 system time base    STC2: time axis of TS2 system time base (correctly, STC2 starts from the time at which the first PCR (Program Clock Reference) of TS2 is input to the T-STD).
An offset value between STC1 and STC2 is determined as follows.
Assuming that
PTS1end: PTS on STC1 corresponding to the last video presentation unit TS1
PTS2start: PTS on STC2 corresponding to the first video presentation unit of TS2
Tpp: presentation period of the last video presentation unit, offset value STC_delta between two system time bases is represented by the following equation (2).STC_delta=PTS1end+Tpp−PTS2start  (2)
Next, audio presentation timing will be described. An overlap of the presentation timing of the audio presentation. unit may exist at the connection point of TS1 and TS2, the overlap being from 0 to less than 2 audio frames. The player 101 must select one of the audio samples and re-synchronize the presentation of the audio presentation unit with the corrected time base after the connection point.
The processing for control of system time clock of the DVR-STD carried out by the player 101 when the time shifts from TS1 to TS2 seamlessly connected to TS1 will be described. At time T5 when the last audio presentation unit of TS1 is presented, the system time clocks may be overlapped between time T2 and T5. Between time T2 and T5, the DVR-STD switches the system time clock from the value (STC1) of the old time base to the value (STC2) of the new time base. The value of STC2 can be represented by the following equation (3).STC2=STC1−STC_delta  (3)
An encoding condition that TS1 and TS2 must meet when the time shifts from TS1 to TS2 seamlessly connected to TS1 will be described.
It is assumed that
STC11video—end: value of STC on system time base STC1 when the last byte of the last video packet of TS1 reaches TB1 of the DVR-STD
STC22video—start: value of STC on system time base STC2 when the first byte of the first video packet of TS2 reaches TB1 of the DVR-STD
STC21video—end: value obtained by converting the value of STC11video—end to the value on system time base STC2.
In this case, STC21video—end is represented by the following equation (4).STC21video—end=STC11video—end−STC_delta  (4)
It is necessary to meet the following two conditions in order for the decoder 120 to conform to the DVR-STD.
(Condition 1)
The timing at which the first video packet of TS2 reaches TB1 must meet the following inequality (5).STC22video—start>STC21video—end+T2−1  (5)
The partial streams of Clip 1 and/or Clip 2 need to be re-encoded and/or re-multiplexed in order to meet the above inequality (5).
(Condition 2)
On the time axis of the system time base obtained by converting STC1 and STC2 to the same time axis as each other, inputs of the video packet from TS1 and subsequent inputs of the video packet from TS2 should not overflow and underflow the video buffer.
However, as described above, the conventional player 101 using the DVR-STD model can process input data between time T1 and T2. That is, since the remaining packets of TS1 are input to the buffer TBn or TBsys of the DVR-STD at a bit rate (maximum bit rate of TS1) of TS_recording_rate(TS1) between time T1 and T2, additional buffer having the capacity capable of buffering data amount corresponding to about 1 second is required in addition to the buffer amount defined by the T-STD.
This buffer capacity is based on the following factor. That is, among MPEG2 TSes, the audio data reproduced in synchronization with the video data corresponding to a certain byte position can exist apart from the multiplexed phase difference within a predetermined region, and the maximum value of this multiplexed phase difference is equal to the data amount corresponding to 1 second. Therefore, the maximum value of N1 of the above equation (1) is equal to the audio data corresponding to up to 1 second. Between time T1 and T2, arrival_time_stamp of the source packet is ignored and the source packet corresponding to the data amount of N1 is input to the audio buffer at the maximum bit rate of TS. Therefore, additional buffer amount (data amount corresponding to about 1 second) is required in addition to the buffer amount defined by the T-STD.
The volume of this additional buffer can be calculated as follows. That is, in the case of the audio stream that has been encoded according to Dolby AC-3 at, e.g., 640 kbps, the audio data corresponding to 1 second is 80 kbytes (=640 kbits). As a result, the additional buffer of 80 kBytes is required.
In the case of the audio stream (24 bit sample, 96 KHz sampling frequency, 8 channels) that has been encoded according to Linear PCM method, the audio data corresponding to 1 second is about 18 Mbits (=24 bit sample×96,000 samples/sec×8 channels). As a result, the additional buffer of about 3 Mbytes is required. Thus, in the case where the above multichannel audio data is employed, the size of the additional buffer becomes extremely large.