Recently, techniques for processing video and audio digital signals have remarkably advanced. With the advancement of the techniques, a system development toward the realization of digital broadcasting and the realization of the integration of broadcasting and communication has been advanced around the world.
Services in a technical field where the integration of broadcasting and communication has progressed include information delivery services by data streaming. These delivery services by the stream delivery method especially tend to increase. The streaming delivery method generally reproduces incoming data in real time. Examples of a system using this method include video on demand (VOD) and live video streaming delivery or a teleconferencing system.
Also, video delivery services on wide area and various networks represented by the internet have rapidly developed. These video delivery services are mostly streaming delivery services using a compression technique such as MPFG (Moving Picture Experts Group) and H.264.
In a real-time system, video/audio outputs are reproduced as follows. A decoder extracts a system reference time (PCR: Program Clock Reference or SCR: System Clock Reference) from a system stream, and reproduces an STC (System Time Clock) using the extracted value. Then, the decoder compares the STC with a PTS (Presentation Time Stamp) of each of a video stream and an audio stream, and reproduces video data and audio data. This processing enables real-time output in which video output and audio output are synchronized.
In addition, a conventional technique attaches time codes to audio data and video data, and causes a receiver to synchronize audio output with video output using the time codes (see, e.g., Japanese Laid-open Patent publication No. 09-65303).
The video output needs to be synchronized with a vertical synchronizing signal (VSYNC). Meanwhile, the audio output can be immediately reproduced in synchronization with the STC. Accordingly, the video output is kept waiting until the next VSYNC occurs after the STC reaches a time indicated by the PTS. As a result, the video output is more delayed than the audio output. At this time, some functions for absorbing differences between a video output timing (occurrence time of the VSYNC) and an audio output timing needs to be added to synchronously start the video and audio outputs. For example, a function of holding the PTS extracted from a video stream and causing the VSYNC to load the held signal into a counter of a system clock on a receiving side needs to be added. This function keeps also the audio output waiting while holding the PTS extracted from the video stream. As a result, the video output and the audio output can be precisely synchronized (see, e.g., Japanese Laid-open Patent publication No. 2002-176643).
However, real-time transmission assumes that a value of the STC is determined based on the PCR or the SCR. This assumption makes it impossible to set an arbitrary value to the STC in a decoder. That is, this assumption makes it impossible to realize the real-time transmission capable of loading the held value of the PTS into the STC at an arbitrary timing for the purpose of synchronously outputting video and audio as described in Japanese Laid-open Patent publication No. 2002-176643. Accordingly, a conventional technique is unable to realize the real-time transmission capable of absorbing an output error due to a waiting time that occurs until the next VSYNC occurs.
Further, a conventional method for synchronously outputting video and audio assumes only a case where the video and audio PTSs of first frames of the video and audio outputs are synchronized. Therefore, this conventional method is not applicable to a case where the PTSs are not synchronized. That is, input timings of video data and audio data to an encoder may fail to synchronize (the PTS values attached to video data and audio data at the start of input operations differ from each other). In this case, the conventional technique performs the following processing. A video stream and an audio stream are supplied to a decoder at each timing. Then, a video source and an audio source compare a first PTS of each of the video stream and the audio stream with the STC, and determine video and audio output start timings, respectively. Therefore, when the first PTS values of the video stream and the audio stream differ from each other, a receiving side fails to synchronize the video and audio output start timings.
Further, it is considered that the decoder receives only a video stream or an audio stream. In this case, measures to distinguish these cases are indispensable.