1. Field of the Invention
The invention generally relates to communication systems in which audio content is transmitted between entities. In particular, the invention relates to systems and methods for improving perceived audio quality in communication systems in which audio content is transmitted between entities operating in different clock domains.
2. Background
In a communication system in which audio content is transmitted between two entities over a wireless link, clock drift between the two entities can result in a degradation of perceived audio quality. For example, in a Bluetooth™ wireless communication system, a wireless link may be established between two Bluetooth™-enabled devices. The wireless link may then be used to transmit audio signals between the two devices, wherein each device uses its own clock for sampling the audio signals. Because there will always be a drift between the two clocks, each device may periodically be required to drop an audio frame from or insert an audio frame within a received audio signal to compensate for the drift. The dropping or insertion of the audio frame creates discontinuities that ultimately impair the quality of the received audio signal when perceived by a listener.
To help illustrate this, FIG. 1 shows a conventional wireless audio communication system 100 that includes a Bluetooth™ enabled cellular telephone 102 and a Bluetooth™ headset 104. As will be appreciated by persons skilled in the relevant art(s), a bidirectional Synchronous Connection Oriented (SCO) link 106 may be established between cellular telephone 102 and headset 104 by which audio signals may be wirelessly transmitted from cellular telephone 102 to headset 104 and by which audio signals may be wirelessly transmitted from headset 104 to cellular telephone 102. For the purpose of this example, however, only the wireless transfer of an audio signal from headset 104 to cellular telephone 102 will be discussed.
Headset 104 operates in a well-known manner to sample an audio signal from an audio source. Typically, the audio source is a user of the headset and the audio signal represents the user's speech. Discrete segments of the audio signal, termed audio frames, are temporarily stored in a jitter buffer 122 within a Bluetooth™ controller 120. Bluetooth controller 120 then operates on the audio frames stored in jitter buffer 122 in a first-in first-out fashion to transmit the frames over wireless link 106 to cellular telephone 102. At cellular telephone 102, the wirelessly-transmitted audio frames are received by Bluetooth™ controller 110 and temporarily accumulated within a jitter buffer 112 prior to being transferred in a first-in first-out fashion to cellular baseband logic 114 for further processing. The interface between Bluetooth™ controller 110 and cellular baseband logic 114, designated interface 116, is typically a PCM interface.
The clock domain of headset 104 is different than the clock domain of interface 116. By buffering a number of audio frames, jitter buffer 112 can help compensate for this difference, but only to a certain extent. For example, because jitter buffer 112 has a limited size, when the number of audio frames available in jitter buffer 112 becomes too great, Bluetooth™ controller 110 must drop audio frames. Furthermore, when there are too few audio frames available in jitter buffer 112, Bluetooth™ controller 110 must insert audio frames (typically representing silence) in order to continue to provide frames to interface 116. A common method for implementing this is to maintain a fixed high watermark and a fixed low watermark for jitter buffer 112. When the number of audio frames stored in jitter buffer 112 exceeds the high watermark, frames are dropped. When the number of audio frames stored in jitter buffer 112 drops below the low watermark, frames are inserted. Each time a frame is dropped or inserted in this manner, a discontinuity in the audio signal is created. This discontinuity greatly impacts the quality of the audio signal as perceived by a listener since human hearing is very sensitive to phase changes.
FIG. 2 depicts a graph 200 that illustrates how an output audio signal may be impacted by the dropping of frames from a jitter buffer in a conventional system implementation. In particular, FIG. 2 shows the magnitude of an audio signal 202 output from a jitter buffer over time. Also shown in graph 200 (as an overlay) is the number of frames in the jitter buffer, designated jitter buffer level 204, as well as the maximum jitter buffer level 206 over the same time period. As shown in graph 200, when the number of frames in the jitter buffer reaches the maximum jitter buffer level, a number of frames are immediately discarded. In the system from which graph 200 was derived, one half of the frames in the jitter buffer are discarded when the maximum level is reached. As also shown in graph 200, when the frames are discarded, a corresponding phase discontinuity is introduced into audio output signal 202. One example of this is shown in the area circled by a dotted line 208. Such discontinuities can have a significantly negative impact on the quality of audio output signal as perceived by a listener.
As will be appreciated by persons skilled in the relevant art(s), the foregoing problem is not limited to Bluetooth™ wireless communication systems but can also occur in any wireless or wired communication system in which audio signals are transmitted between entities operating in different clock domains.
What is needed then is a system and method for discarding or inserting audio frames in a jitter buffer that provides improved audio quality as compared to conventional jitter buffer management systems. The desired system and method should be generally applicable to any wireless or wired communication system in which audio signals are transmitted between entities operating in different clock domains.