1. Field of the Invention
The invention is in the field of audio communications and specifically in the field of audio processing over networks where packets may be delayed.
2. Related Art
Audio data communicated over networks, such as the Internet, is typically communicated as a sequential series of packets. Each packet includes the audio data, sequence information, destination information, etc. according to a standard such as TCP/IP. The packets are sent from a source to a receiver. Typically, the receiver includes a receive buffer with a limited capacity. A feedback loop is sometimes used to limit sending of packets from the source to times when space is available in the receive buffer. At the receiver, the audio data is assembled into an audio stream according to the sequence identification in each packet. This audio stream is typically presented to a user of the receiver in real time. The user is a person listening to the audio stream.
When using unpredictable networks, such as the internet, there is a possibility that a packet is received late. As used herein, the phrases “received late” and “late packet” are meant to characterize a packet that is received too late for the audio data within the packet to be seamlessly used in the audio stream. For a late packet the timing of the packet's receipt results in a gap or delay in the audio stream. This gap may be so small, e.g., on the order of a few microseconds, that it does not significantly affect the audio as perceived by the user. Or this gap may be long enough for the user to hear a disruption of the audio stream. In either case, when a packet arrives late, that part of the audio stream derived from audio data within the late audio packet both starts and finishes later than it would have if the packet had not been late.
The problems caused by late packets can be problematic in systems in which it is important that the audio stream be presented as quickly as possible to the user. These systems include, for example, web conferencing, telepresence, and streamed video gaming. In each of these systems it is desirable to minimize the lag time between when an event occurs at the audio receiver, audio data generation at the audio source that is dependent on this event, and/or presentation of the generated audio data to a user at the receiver. For example, in online video games, in which the game audio is generated at a remote game server and delivered to a client, the event may be receipt of a game command to shoot a gun. This game command is communicated from the client to the game server. At the game server audio, e.g., the sound of a gunshot, is generated and sent back to the client in data packets. At the client the audio data within the packets are assembled into an audio stream and presented to a user. It is desirable for the user's hearing of the gunshot to be as close as possible in time to the entry of the command to shoot. In order to minimize this time lag the receive buffer on the client is typically kept to a minimum size so that audio data is not stored in this buffer for too long. Unfortunately, the small receive buffer increases the probability that a delayed packet will result in an interruption of the resulting audio stream. Some modification of audio frequency has been performed in the prior art. See, for example, www.screemingbee.com. However, such systems change the frequency of sound without changing the time in which it takes to present.