1. Field of the Invention
The present invention relates to an information processing technique and, more particularly, to an information processing technique of processing moving image data.
2. Description of the Related Art
A moving image compression/decompression transmission system for multiplexing and transmitting an audio signal and a video signal generally needs to cause its audio (sound) and video (image) signal processing units to output signals in synchronism.
To enable synchronization between an image and sound, MPEG2 and MPEG1, which are international standard coding methods, use output timing information (information of the output time of a moving image) called a timestamp (to be referred to as a “TS” hereinafter). The TSs include a presentation timestamp (to be referred to as a “PTS” hereinafter) and a decoding timestamp (to be referred to as a “DTS” hereinafter).
An MPEG system reproduces and outputs an audio signal or video signal when the STC (System Time Clock) of its decoder matches a PTS. The STC is defined by MPEG2 or MPEG1 to give reference time information. In either MPEG format, a DTS is provided in correspondence with the difference in the decoding order and reproduction output order, which is generated by sending I and P frames to a coded stream before a B frame. If the PTS matches the DTS, only the PTS is added as a timestamp.
MPEG2 uses a method of packetizing a plurality of packets in a variable length, like a PES (Packetized Elementary Stream) packet, and adding a timestamp to the video stream. In general, a stream formed by multiplexing the coded data of an audio signal (to be referred to as an “audio stream” hereinafter) and the coded data of a video signal (to be referred to as a “video stream” hereinafter) is called a system stream. The system stream is added with both the audio and video TSs.
Note that the TS is added with reference to an access unit for both the audio and video signals. For an audio signal access unit, a TS is added with reference to a syncword (a code arranged for every predetermined word in an audio stream). For a video signal access unit, a TS is added with reference to a picture start code (a code representing a break of pictures in a video stream). The DTS and PTS are added to the header of a PES packet.
FIG. 29 is a block diagram showing the schematic arrangement of a conventional moving image compression/decompression transmission system. The moving image compression/decompression transmission system is roughly divided into transmitting- and receiving-side systems. Referring to FIG. 29, on the transmitting side, an input audio signal 50 is input to an audio coding processing unit 51. The audio coding processing unit 51 codes the input audio signal 50 as an audio stream 52, and outputs the audio stream 52 to a multiplexer 56. An input video signal 53 is input to a video coding processing unit 54. The video coding processing unit 54 codes the input video signal 53 as a video stream 55, and outputs the video stream 55 to the multiplexer 56.
The audio stream 52 and video stream 55 input to the multiplexer 56 are multiplexed into a system stream 61 and output to a transmission path 62.
The multiplexer 56 includes a system time clock (STC) processing unit 57, audio packetization unit 59, video packetization unit 60, and switching unit 1005.
The audio packetization unit 59 forms packets by separating the received audio stream 52 at a predetermined length, and outputs data 1003 to the switching unit 1005. The switching unit 1005 selects each packet and multiplexes it into the system stream 61. At this time, if data length packetized in the audio stream 52 includes a syncword, a TS is added to the header portion of the packet. Note that the TS is obtained based on STC data 58 from the system time clock processing unit 57.
The video packetization unit 60 also forms packets by separating the received video stream 55 at a predetermined length, and outputs data 1004 to the switching unit 1005. The switching unit 1005 selects each packet and multiplexes it into the system stream 61. At this time, if data length packetized in the video stream 55 includes a picture start code, a TS is added to the header portion of the packet. Note that the TS is obtained based on the STC data 58 from the system time clock processing unit 57.
To enable STC reproduction on the receiving side, the switching unit 1005 periodically selects the STC data 58 and multiplexes it into the system stream 61. Next, on the receiving side, a system stream 63 transmitted via the transmission path 62 is input to a demultiplexer 64 and demultiplexed. The system stream 63 is demultiplexed into an audio stream 67, audio TS 1012, video stream 70, video TS 1013, and STC data 1010 and output.
The demultiplexer 64 includes a switching unit 1006, STC reproduction unit 1011, audio packet analysis unit 65, and video packet analysis unit 66. The switching unit 1006 outputs, from the system stream 63, audio packet data 1007 to the audio packet analysis unit 65, video packet data 1009 to the video packet analysis unit 66, and STC data 1008 to the STC reproduction unit 1011.
The audio packet analysis unit 65 analyzes the received audio packet data 1007, separates it into the audio stream 67 and audio TS 1012, and outputs them to an audio decoding processing unit 68. The video packet analysis unit 66 analyzes the received video packet data 1009, separates it into the video stream 70 and video TS 1013, and outputs them to a video decoding processing unit 71.
The STC reproduction unit 1011 reproduces the STC data 1010 from the received STC data 1008 to always output the same data as the STC data 58 output from the system time clock processing unit 57 on the transmitting side. The STC reproduction unit 1011 outputs the STC data 1010 to the video decoding processing unit 71 and the audio decoding processing unit 68. The audio decoding processing unit 68 decodes the received audio stream 67 with reference to the audio TS 1012 and STC data 1010, and outputs an output audio signal 69 at a time corresponding to the TS.
The video decoding processing unit 71 decodes the received video stream 70 with reference to the video TS 1013 and STC data 1010, and outputs an output video signal 72 at a time corresponding to the TS.
The output operation of each decoding processing unit based on the TS will be described next in detail. The output operation based on the TS is implemented by causing the audio decoding processing unit 68 and the video decoding processing unit 71 to output corresponding data portions when the value of the PTS matches the value of the STC data of each of the audio and video signals.
For example, the value of a PTS corresponding to a syncword An in the audio stream 67 is represented by be PTS (An). When the value of the STC data 1010 matches PTS (An), the audio decoding processing unit 68 outputs decoded data corresponding to the syncword An.
Similarly, the value of a PTS corresponding to a picture start code Vn in the video stream 70 is represented by be PTS (Vn). When the value of the STC data 1010 matches PTS (Vn), the video decoding processing unit 71 outputs decoded data corresponding to the picture start code Vn.
Assume that in the moving image compression/decompression transmission system shown in FIG. 29, a time ta elapses from input of the input audio signal 50 to the audio coding processing unit 51 on the transmitting side to output of the output audio signal 69 from the audio decoding processing unit 68 on the receiving side. Similarly, assume that a time tv elapses from input of the input video signal 53 to the video coding processing unit 54 on the transmitting side to output of the output video signal 72 from the video decoding processing unit 71 on the receiving side.
In this case, to ensure synchronization between the video and audio signals, the process time of the transmission processing system is set to make the times from input to output satisfy ta=tv, and the PTS times are set based on the process time. TS calculation in the audio packetization unit 59 and the video packetization unit 60 included in the multiplexer 56 will be described next.
Assume that out of the time ta from input to output of the audio signal, the time until input to the multiplexer 56 is ta1, and the remaining time is ta2. Also assume that data to be packetized in the audio stream 52 input to the multiplexer 56 contains a syncword An. If the value of the STC data 58 when adding a TS to the header portion of the packet is TC2(An), PTS(An) is obtained byPTS(An)=STC2(An)+ta2  (1)
For the video as well, assume that out of the time tv from input to output of the video signal, the time until input to the multiplexer 56 is tv1, and the remaining time is tv2. Also assume that data to be packetized in the video stream 55 input to the multiplexer 56 contains a picture start code Vn. If the value of the STC data 58 when adding a TS to the header portion of the packet is TC2(Vn), PTS(Vn) is obtained byPTS(Vn)=STC2(Vn)+tv2  (2)
However, as for a video signal, the video coding processing unit 54 converts it into a variable length code or changes the order of pictures. For this reason, tv1 varies depending on the coding condition. Hence, tv2 also varies (because tv is fixed, and tv2=tv−tv1), and tv2 to be added to STC2(Vn) cannot be preset. It is therefore impossible to obtain the PTS in the same way as in the audio signal. In a video signal, the order of pictures changes. Hence, a DTS must be obtained, too.
Picture order change will be explained here with reference to the schematic view in FIG. 30. In FIG. 29, a delay time associated with picture order change is taken into consideration, though other process times and the like are omitted. Referring to FIGS. 30, 30a indicates the input video signal 53 input to the video coding processing unit 54; 30b, the video stream 55 output from the video coding processing unit 54 (or the video stream 70 output from the video packet analysis unit 66); and 30c, the output video signal 72 output from the video decoding processing unit 71. Symbols I, P, and B added to all signals represent picture coding types defined by MPEG2 and MPEG1. In the example shown in FIG. 30, the picture interval of I and P frames is 3. The picture interval of I and P frames is generally called an M value (M value=3 in the example of FIG. 30). Parenthesized numerical values on the respective signals are temporal reference values representing the picture sequence in the input video signal 53.
Picture order change is done in the following way. In the input video signal 53 indicated by 30a, only the B frames are sequentially delayed and inserted next to the succeeding I or P frame to change the order of the pictures, thereby forming the video stream 55 indicated by 30b. The changed state is the same as in the video stream 70.
Next, the video decoding processing unit 71 conversely sequentially delays and inserts the I and P frames of the video stream 70 next to the succeeding consecutive B frames to restore the original order, thereby obtaining the output video signal 72 having the original order, as indicated by 30c. Causing the video decoding processing unit 71 to restore the original order is called reorder.
FIG. 31 shows the relationship between the PTS and DTS in the reorder of the video decoding processing unit 71. This will be explained. The video stream 70 input to the video decoding processing unit 71 is stored in a video decoding buffer 73 to absorb the variation on the time axis generated by variable length coding, and then output to a video decoding circuit 75 as a video stream 74. The video decoding circuit 75 decodes the received video stream 74 into data 76.
The B frame portion of the data 76 is directly output as the output video signal 72. On the other hand, the I of P frame portion of the data 76 is delayed by a video reorder buffer 77 and becomes data 78. The data 76 and 78 are output as the output video signal 72 via a switching unit 79. Switching between the data 76 and 78 is done by the switching unit 79. The delay time in the video reorder buffer 77 is (M value×picture cycle). The processing of the video decoding processing unit 71 enables to restore the original order of pictures.
Referring to FIG. 31, the PTS represents the output time of the output video signal 72 from the video decoding processing unit 71. On the other hand, the DTS represents the output wait time of the video stream 74 from the video decoding buffer 73. In MPEG2 and MPEG1, the process time of the video decoding circuit 75 is assumed to be zero in the definition of TS. For this reason, for a B frame, DTS=PTS.
Hence, the PTS and DTS are added to I and P frames, whereas only the PTS is added to a B frame (since DTS=PTS). As described above, TS addition to a video signal is more complex than for an audio signal.
FIG. 32 shows a structural example in which the picture start code, temporal reference, and picture coding type used in the above explanation are added to the video stream 55 in FIG. 29. This is an example of MPEG. As shown in FIG. 32, a coded data portion corresponding to each frame or field of the input video signal 53 (FIG. 29) is called a picture layer and starts with a picture start code that is a unique 32-bit value. A 10-bit temporal reference follows the code. A 3-bit picture coding type exists next to the temporal reference.
A header such as 16-bit video delay time control information (video buffering verifier delay: vbv_delay) is added then, and actual coded data follows the header. The video delay time control information represents a delay time in the buffer (video decoding buffer 73) in the video decoding processing unit 71 necessary for receiving variable-length-coded data at a predetermined rate as its average rate and decoding the data. Buffer read-access control is performed based on the video delay time control information, thereby avoiding underflow or overflow of the buffer.
The video delay time control information is generated and added inside the video coding processing unit 54. At this time, the video delay time control information is generally calculated and generated for each picture based on the capacity of the buffer in the video decoding processing unit 71 on the receiving side, the code amount that is the compression result of each picture, and the average rate. In MPEG2, the video delay time control information is called vbv_delay, and the buffer is called a VBV buffer.
A conventional TS calculation method will be described next. FIG. 33 is a view showing the functional arrangement of a conventional timestamp adding apparatus. The video coding processing unit 54 includes an encoder 54a, encoder-side vbv buffer 54b, and output unit 54c. The video packetization unit 60 includes a packet buffer 60a, timestamp adding unit 60b, timestamp calculation unit 60c, and system time clock buffer 60d. 
The input video signal 53 is variable length data and is input at a variable transmission rate. In the transmission path of the system, transmission is normally done at a predetermined rate. To ensure a predetermined transmission rate in the transmission path between the video coding processing unit 54 and the multiplexer 56, the encoder-side vbv buffer 54b is necessary. Using the STC upon inputting the input video signal 53 to the video coding processing unit 54, the timestamp calculation unit 60c generates a timestamp. At this time, the video signal requires a process time to some extent (several ten msec) for signal processing. The above-described picture order change increases the delay in coding processing until use for STC value calculation. To buffer the delay, the system time clock buffer 60d is provided.
The packet buffer 60a temporarily holds data to be added with a timestamp and packetized. The packet buffer 60a synchronizes data input to the timestamp adding unit 60b with the input of the timestamp generated by the timestamp calculation unit 60c to the timestamp adding unit 60b, thereby adjusting the timestamp addition timing. Hence, data input from the packet buffer 60a to the timestamp adding unit 60b is done at a predetermined rate.
The above-described conventional technique is disclosed in, for example, Japanese Patent Laid-Open No. 9-307891.
In the prior art, the process time of, for example, video quality enhancement processing is not taken into consideration. If data is too late for the time represented by the timestamp, the image is disordered. If data arrives before the actual timestamp, a buffer for holding data is necessary. For this reason, if the difference between the timestamp and the data arrival time is large, a very large buffer area must be allocated.
In the above-described system, a packet processing apparatus is sometimes arranged between the packet transmitting-side apparatus and the packet receiving-side apparatus. In this case, the process time varies depending on the state of the packet processing apparatus. For this reason, the packet receiving-side apparatus may be unable to output video data at the time represented by the timestamp, resulting in disorder in the image. This phenomenon is called underflow. On the other hand, if the timestamp is set too late, the capacity of packets to be held by the packet receiving-side apparatus may increase, resulting in buffer overflow.
Japanese Patent Laid-Open No. 9-233425 has proposed a technique of, if decoding processing is interrupted in a display apparatus that has received video data, re-executing the decoding processing to prevent disorder of display. However, this method does not necessarily allow display at a time designated by a timestamp.
Japanese Patent Laid-Open No. 2005-217863 has proposed an IP telephone terminal apparatus which causes a transmission apparatus to adjust, for example, the packet transmission interval based on the packet reception time of a reception apparatus. As for video data, however, the frames need to be displayed at a predetermined interval. It is therefore impossible to prevent underflow or overflow only by simple transmission interval adjustment.