1. Field of the Invention
The invention relates in general to an audio time stretch method and associated apparatus, and more particularly to a method for audio time stretch by utilizing audio data with low energy and associated apparatus.
2. Description of the Related Art
Internet real-time audio/video transmission techniques, e.g., Voice over Internet Protocol (VoIP), offer people immediate and realistic multimedia services, and are thus one of the most important research and development targets for information technology developers.
In Internet real-time audio/video transmission, a transmitting end samples, digitalizes and encodes audio to be transmitted into a plurality of digital audio data each corresponding to an amplitude sample of the audio. A certain number of audio data are packaged in an Internet packet, which is transmitted to a receiving end. Upon receiving the packet at the receiving end, the packet is de-packetized, decoded and demodulated to the original digital audio data. The digital audio data are digital-to-analog converted to restore the original analog audio data that are then played.
At the transmitting end, each audio data corresponds to a predetermined sampling time sequence. Therefore, at the receiving end, it is essential that the audio data be digital-to-analog converted according to the same sampling time sequence, so as to reconstruct the audio to be transmitted by the transmitting end. In order to perform digital-to-analog conversion according to the predetermined time sequence, the receiving end needs to provide the audio data to the digital-to-analog converting mechanism according to a specific time sequence. However, since the audio data are obtained from the packets, the quality of audio played at the receiving end is undesirably affected in the event that the time sequence of the packets transmitted to the receiving end is irregular.
The time sequence of packets transmitted in the Internet real-time audio/video transmission is in fact affected by various factors, e.g., jitter and clock drift. When the packets are transmitted via the Internet, the packets arrive at the receiving end after being routed through different paths due to Internet protocols, such that the packets do not arrive at the receiving end according to the time sequence based on which they are transmitted—such is referred to as “jitter”. Further, different reference clocks utilized by the transmitting end and the receiving end may also lead to differences in the packets transmitted. For example, suppose a packet length according to a predetermined protocol is 10 ms, the transmitting end transmits an audio packet every 10.01 ms, and the receiving end plays a packet every 9.99 ms. In a period during which 100 packets are transmitted, an acknowledgement time difference between the two ends reaches as high as 2 ms—such is referred to as “clock drift”.
At the receiving end, in order to provide audio data to the digital-to-analog conversion mechanism according to a predetermined time sequence, audio time stretch is required by the time sequence. When the receiving end fails to in time acquire the audio data from the packets, additional audio data needs to be inserted; in contrast, the receiving end removes/discards a certain amount of audio data when the packets provide more audio data than the receiving end can buffer.
However, inappropriate time stretch may degrade the quality of audio playback such that noticeable audio imperfections are observed by a listener at the receiving end.