Digital music is now pervasive. Many of us carry portable music players to listen to music on the bus, subway or while travelling to school or work. Internet radio and other network-based music delivery mechanisms deliver audio programming streams to a wide variety of player devices. Audio books can be downloaded in an instant for playback on tablet computers, smart phones, and many other devices. Given the importance of music and audio programming to our daily lives, digital music and audio will only become more important in the future.
Digital audio is often compressed to reduce storage size and/or the time required to transmit or download audio files. Audio compression and decompression algorithms are typically implemented by audio “codecs” (coders/decoders) used in many consumer audio and other devices. Such codec technology has enabled “streaming” of digital audio to provide a potentially endless stream of audio program material for playback. Streaming is a technique that allows playback of a long piece of audio without requiring the entirety of the audio data to be loaded into memory. Such streaming is now commonly used for Internet radio for example where a source encoder continually streams music or other digital audio program data over a network to one or many receivers for playback.
In streaming and other arrangements that decode compressed audio data, a playback buffer is commonly used to temporarily store the next portion of encoded data for decoding. Such playback buffers are used e.g. to prevent interruptions in playback due to delay in retrieving additional data over a bus, network, etc. Streaming in some conventional systems can involve looped playback of a relatively small playback buffer. This small playback buffer has its data continually refilled with new data as data is consumed. This is a little like refilling a coffee carafe before it is emptied so the coffee drinker perceives a seemingly-endless supply of coffee. Even though most consumer devices have some type of playback buffer, not all such devices were designed to facilitate playback of streamed encoded data where the playback buffer is continually refilled with new data.
Some devices may be designed to repeatedly play back short loops, repeatedly playing back the same data from its playback buffers. To facilitate this behavior, the device may reinitialize or reset decoder state values each time the decoder loops back to the start of its playback buffer. This may make the device unsuitable for playing back streamed data, as this behavior can have the effect of de-synchronizing the streaming data encoder from the decoder, resulting in substantial audible distortions in the decoded samples.
For example, errors may occur when the decoder resets its predictor and/or step values when it loops back to the beginning of the buffer, instead of using state values based on previous samples. The user can hear such errors as amplitude variations, pops, clicks, etc. Because the playback buffer typically retrieves playback data in blocks, such errors will naturally occur at block boundaries.
It is sometimes possible in such a system to use an additional mechanism (e.g., programmed microprocessor) to provide a decoder capable of handing streaming data. While this approach has the possibility of providing excellent sound quality, it also increases computational loading of the system (e.g., 10% of the CPU in one example). Providing a way to decode/playback streamed ADPCM or other audio data using an existing decoder not designed for streaming could eliminate the need for an additional (e.g., software based) decoder, and thus reduce memory usage, playback bus traffic, playback CPU load and/or buffering requirements in main memory. Thus, it would be desirable to eliminate distortions and maintain synchronization between the encoder and decoder despite the resetting behavior of the decoder to thereby decode/playback streamed data. Such gains desirably would come with only a slight sound quality penalty versus the best sounding option provided by a soft decoder.
Different methods might be used to calculate the common predictor and step index values with the goal of reducing pops/clicks at the block boundaries. One possible method for example would be to set the predictor for Predictor=x[0] (i.e., the first input sample), and to set step index=f(x)[0]−x[1]) (table lookup based on difference between first and second input samples). An additional possible “Zero” method would be for encoder 114 to reset the predictor and step index values to 0 (Predictor=0, Step Index=0) with the idea that 0 is a better average value for all of the blocks. Another possible “averaging method” would set the predictor and step indexes to average values under an assumption that the actual average is better than an assumed average. For example, it might be possible to use multi-pass weighted averaging. It might also be possible to favor blocks with high error so that the resultant predictor and step index values would be skewed to favor reducing big errors. While these solutions may objectively reduce the error signal, noticeable pops/clicks may still exist.
To solve these problems, embodiments herein generate a compensation signal to provide compensation at block boundaries. A compensation signal operation may involve for example injecting a band-limited pseudo-random noise or ultra low frequency signal at boundary points. It may be possible to inject pre-error into the encoder per se, or to inject such a signal in the source signal before encoding. Thus, after deciding what the predictor value will be, it is possible to add an error signal to the original signal such that the predictor error is 0 and if possible, the step error is 0 also.
In one exemplary illustrative non-limiting implementation, an ultra low frequency (inaudible) signal can be used so the user cannot hear the compensation signal and the signal does not have to be filtered by band-limited output filters. It is also possible to optimize block sizes to use a block size that results in minimizing differences in sample values at the start of block boundaries. These techniques reduce and/or eliminate audible errors in the decoded signal, despite resetting the state values used by the decoder.
Thus, one aspect of certain exemplary embodiments relates to a method, system and/or non-transitory computer readable medium for encoding an audio signal to reduce and/or eliminate errors due to resetting of state values at (some) audio streaming data block boundaries. A compensation signal can be mixed with or included in the audio signal, and the combined signal is encoded. The compensation signal has a characteristic selected so that the encoded audio signal substantially matches the reset decoder state value at block boundaries.
Another aspect of certain exemplary embodiments relates to a portable electronic device that includes a playback buffer for receiving an encoded audio signal, and a decoder programmed or configured to decode the encoded mixed audio signal, using state values that are reset based on playback buffer access. The encoded audio signal includes a compensation signal having a characteristic selected so that the encoded audio signal substantially matches decoder reset state values at block boundaries.
Where the encoding/decoding used are ADPCM encoding/decoding, the state values may be a predictor and/or step index value.