Video and audio program signals are converted to a digital format, compressed, encoded and multiplexed in accordance with an established algorithm or methodology. The compressed digital system signal, i.e., bitstream, includes a video portion, an audio portion, and an informational portion. Such data is transmitted to a reproducing apparatus via a transmission line or by being stored in a recording medium. A digital reproducing apparatus such as a digital versatile disc (DVD) system, a digital video cassette recorder (VCR) or a computer system incorporated with a multimedia player solution for reproducing multimedia data obtained by multiplexing video data and audio data is provided with a decoding means for reproducing the aforementioned bitstream. This decoding means demultiplexes, de-compresses and decodes the bitstream in accordance with the compression algorithm to supply it as a reproducible signal. The decoded video and audio signals are outputted to a reproducing apparatus such as a screen or a speaker for presentation to the user.
The compressing and encoding of the video and audio signals are performed by a suitable encoder which implements a selected data compression algorithm that conforms to a recognized standard or specification agreed to among the senders and receivers of digital video data. Highly efficient compression standards have been developed by the Moving Pictures Experts Group (MPEG), including MPEG-1 and MPEG-2, which have been continuously improved to suggest MPEG-4. The MPEG standards enable the high speed or low speed reproduction forward or backward in addition to the normal playback mode in the VCR, DVD or similar multimedia recording/reproducing apparatus.
The MPEG standards define a proposed synchronization scheme based on an idealized decoder known as a standard target decoder (STD). Video and audio data units or frames are referred to as access units (AU) in encoded form, and as presentation units (PU) in unencoded or decoded form. In the idealized decoder, video and audio data presentation units are taken from elementary stream buffers and instantly presented at the appropriate presentation time to the user. A presentation time stamp (PTS) indicating the proper presentation time of a presentation unit is transmitted in an MPEG packet header as a part of the system syntax.
The presentation time stamps and the access units are not necessarily transmitted together since they are carried by different layers of the hierarchy. It is therefore necessary for the decoder to associate the presentation time stamp found at the packet layer with the first access unit which follows it. The situation is further complicated by the fact that in a real decoder the system has little control over the presentation times of the presentation units. For example, in the video decoder, video frames (pictures) must be presented at exact multiples of the frame rate for the video to appear smooth, and the audio frames must be presented at exact multiples of the audio frame rate for the audio to be free of clicks.
In the idealized MPEG synchronization scheme, a system time clock (STC) which maintains a system clock time is provided in the decoder. The initial value of the system clock time is transmitted in the system stream by the encoder as a system clock reference (SCR) in an MPEG-1 bitstream, or as a program clock reference (PCR) in an MPEG-2 bitstream. The decoder sets its local system time clock to the initial value, and then continues to increment it at a clock rate of 90 kHz.
Subsequently, the encoder transmits a presentation time stamp for an audio or video access unit, followed some time later by the AU itself. The decoder compares the PTS to the local system clock time. If they are equal, the AU unit is removed from the elementary stream buffer and is instantly decoded to produce a corresponding PU for the presentation of the same.
However, in the conventional multimedia reproducing apparatuses such as the DVD system, digital VCR or computer system incorporated with a multimedia player solution, when the user selects a fast or slow playback mode, the video data is reproduced in accordance with a designated playback speed while the audio data is subject to be muted due to the difficulty of keeping synchronization with the video data. There are improved reproducing apparatuses which reproduce the audio data altogether during the fast or slow playback mode. In this case, however, the presentation time interval of the audio data sample is outputted by being simply increased or decreased in accordance with the designated playback speed. In more detail, in case of the fast playback mode, the presentation time interval of respective audio data becomes narrower than that during the normal playback to make the tone of the reproduced sound be high because of raising it by octave; contrarily, in case of the slow playback mode, the presentation time interval of respective audio data becomes wider than that during the normal playback to make the tone of the reproduced sound be low because of dropping it by octave to induce so-called tone variation phenomenon.
The above-stated tone variation phenomenon appears identically in the video recording/reproducing system such as the VCR or a cassette tape recorder which is the analog signal processing apparatus. In these systems, if the user varies the playback speed at a high speed or low speed, the speed of reading out the signals from a recording medium by the reproducing apparatus is correlated with the varied speed to be fast or slow. Thus, when the read-out audio signal is outputted unchanged, the audible tone of the reproduced sound is varied as having the high or low sound when compared with that of the reproduced sound at the normal speed.
FIG. 1 shows a functional block diagram related to the decoding of an MPEG reproducing apparatus for reproducing a MPEG file. The MPEG file supplied from a file source is separated into video data and audio data by means of a data separator 12 via a data input 10. Separated video data and audio data are respectively received into a video decoder 14 and an audio decoder 18 to be restored into original data by being decoded in them, which are then respectively supplied to a video output 16 and an audio output 20 to be reproduced as video and sound.
Incidentally, if the user instructs the high speed or low speed reproduction, audio decoder 18 changes the PTS value contained in a header of an audio packet in accordance with the fastness or slowness of the designated playback speed. Thus, the presentation time interval of respective audio sample is compressed or extended as compared with that of the normal playback mode. When the conventional MPEG file is subjected to fast or slow playback mode, the time interval of reproducing respective audio sample is compressed or extended as compared with that of the normal playback to make the tone of the reproduced sound vary to be heard by being modulated as high or low sound.
The tone variation arises because the conventional reproducing system of fast or slow reproduction mode simply extends or compresses the presentation time interval of respective audio signals in the time scale. What's worse, any other signal processing is separately applied for preventing the tone variation. In other words, an additional scheme is further required for preventing the tone variation during the fast or slow reproduction mode.