Conventional wireless audio devices can exchange packets of audio data. For example, a wireless transmitter (TX) may send audio data to a wireless receiver (RX) for a playback. The wireless link between the TX and the RX is often asynchronous, i.e., the RX receives and plays the audio stream without being synchronized with the speed of the data transmission at the TX. An example of such asynchronous communication is Advanced Audio Distribution Profile (A2DP) protocol, where the data are transferred between the TX and the RX using a Bluetooth standard or protocol. In operation, audio packets are received by the RX at irregular intervals, and are buffered locally on the RX while being constantly consumed and played by the RX. In practical applications, the buffer is necessary to support the asynchronous reception of new data and to provide some robustness against sudden changes in data throughput caused by degraded connectivity. The size of the buffer on the RX is not mandated by the Bluetooth specifications, but most devices on the market adopt a buffer having a capacity corresponding to about 150-200 milliseconds (ms) of audio data stream as a trade-off among robustness, playback latency and on-chip memory footprint.
FIG. 1 is a schematic view of communication between a TX 30 (also referred to as a “source”) and an RX 40 (also referred to as a “sink”) in accordance with conventional technology. The TX 30 may be an A2DP source that asynchronously transmits data to the RX 40. Because of the asynchronous transmission, data rates at the RX cannot be derived based on the protocol timing.
In operation, the source 30 transmits audio data over an antenna 34 at a rate that is controlled by a clock 32 (“CLOCK1”). The audio data are wirelessly transmitted to an antenna 44 of the RX 40. Generally, the frequencies of the CLOCK1 of the TX and a clock 42 (“CLOCK2”) of the RX are not synchronized. Therefore, the received audio data are processed using a sample rate converter (SRC) 46 that takes into account the differences in the frequencies of CLOCK1 and CLOCK2. The processed data are sent to a digital-to-analog converter (DAC) 47, and further to a speaker 48 for, e.g., sound output.
However, the frequency of the CLOCK1 may drift over time, making it difficult for the RX 40 to reliably reproduce the data at a speaker 48. For example, some conventional sources 30 include clocks that drift about 140 part-per-million (ppm). The CLOCK2 may also drift. Therefore, many conventional RX-es use buffers to store the data from the TX before processing.
FIG. 2 is a graph 200 showing a filling level for data buffer in accordance with conventional technology. The horizontal axis represents time in seconds. The vertical axis represents buffer filling level. For example, the buffer filling level of 0.15 on the vertical axis corresponds to the amount of data that the buffer would store after 0.15 second worth of receiving the audio stream without any consumption of the buffered data by the SRC 46. As another example, a sudden drop (denoted by numeral 10) of the buffer filling level from the level of about 0.18 second to about 0.055 second corresponds to the consumption of about 0.125 second worth of the audio data without any new data being received by the buffer. The sudden drop 10 may result from a failed reception of the data by the RX, followed by a successful re-transmission of the audio data. Therefore, the jagged short-term oscillations of the buffer level are caused by the failed transmissions followed by the re-transmissions.
A trend line 20 corresponds to a long-term trend of the buffer filling level. In general, the buffer tends to either overflow or underflow because of the divergence between CLOCK1 and CLOCK2 (e.g., caused by frequency drift) and/or because of failed transmissions followed by retransmissions of the packets of audio data. With some conventional technologies, data overflow can happen in less than 10 minutes of playback. In some cases, the buffer may overflow even earlier, because the normal working level of the buffer is set close to a full level to make the RX more robust against unsuccessful data transmissions.
Some conventional technologies use a “sample add and drop” approach where the RX discards the audio data when the filling level of the buffer is close to its upper bound, and the RX duplicates the audio data when the filling level is close to it lower bound. However, this brute-force approach causes audible artefacts, as illustrated in a spectrogram 300 of FIG. 3. The horizontal axis of the spectrogram corresponds to the audio playback time in seconds. The vertical axis corresponds to frequencies of the audio signal as played by, for example, the speaker 48 in FIG. 1. The spectrogram 200 shows a pure tone 62 that was emitted by the source, received by the RX, and played by the speaker. The pure tone 62 occupies a relatively narrow-band tone at about 1.7 kHz. However, when the “sample add and drop” method is used to control the filling level of the buffer, the audio stream generates undesirable noise 64 that can be typically heard as clicks at the speaker.
Some conventional technologies attempt to constantly adjust the frequency of CLOCK2 to better regulate the filling level of the buffer to supplement the “sample add and drop” method, and, therefore, reduce the noise in the audio stream. However, those conventional methods are predicated on the a-priori knowledge of the true frequencies of both CLOCK1 and CLOCK2, which is not always the case with the wireless data transmission standards. Accordingly, there remains a need for methods and systems for wireless communication between the source of the audio data and the RX where the filling level of the buffer can be controlled, while the audible noise is reduced.