The present invention relates to digital audio playback, and more particularly to random access in decoding audio files.
Traditionally, speech coder/decoders (codecs) are used for two-way real-time communication to reduce bandwidth requirements over limited capacity channels. Examples include cellular telephony, voice over internet protocol (VoIP), and limited-capacity long-haul telephone communications using codecs such as the G.7xx series (e.g., G.723, G.726, G.729) or AMR-NB and AMR-WB (Advanced multi-rate narrow band and wideband). In recent years new applications have used speech codecs to compress audio data for storage and playback at a later time; this contrasts with the original two-way real-time communication codec design. Specifically, AMR-NB and AMR-WB speech codecs originally intended for cellular telephony are being increasingly used for audio compressed storage. For example, using such a method, live audio (and optionally video also) can be recorded using a cell phone for forwarding and sharing with other cell phone users.
Applications such as these are expected to be regular features in 3G cell phones connected to the GSM network. The 3GPP standards body has defined the evolution of the GSM network and services to address these applications and has specified the Adaptive Multi-Rate (AMR) family of codecs as mandatory for encoding and decoding of audio.
There are two flavors of AMR:                Narrowband (AMR-NB) supporting sampling frequency of 8 KHz and bit rates ranging from 4.65 kbps to 12.2 kbps.        Wideband (ARM-WB) supporting sampling frequency of 16 KHz and bit rates ranging from 6.6 kbps to 23.85 kbps.        
Originally, the primary purpose of the AMR codecs was speech coding for real-time communication to reduce bandwidth requirements in cell phones. AMR offers high quality at low bit rates, and thence reduced storage requirements if used in a non-real-time storage scenario. AMR has the advantage of greatly reduced complexity as compared to popular audio encoders such as MP3/AAC. As a result, AMR is the preferred codec for recording and playback of audio in 3G cell phones; although, AMR-NB is primarily for speech.
Traditionally, speech standards (including AMR) define the bit syntax for transmission purposes. The input audio is typically divided into fixed-length frames and a variable number of bits are used to specify the encoded data in each frame. AMR is an algebraic code-excited linear-prediction (ACELP) method with the differing bit rates reflecting the total number of bits allocated to the frame parameters (LP coefficients, pitch, excitation pulses, and gain).
Since storage is almost never a primary goal during standardization, typically the speech codec standards do not specify the file format that must be used wherever the codec is used in a storage application. However, for some specific speech codecs, simple file storage formats have been defined. One important example is the AMR file format specified by the Internet Engineering Task Force (IETF) RFC 3267, which has been adopted by 3GPP. IETF RFC 3267 defines file storage formats for AMR NB and AMR WB codecs. The basic structure of an AMR file is shown in FIG. 8. The AMR data format specified in RFC 3267 has the following properties:                The data in each audio frame is composed of two concatenated components: (i) a “frame header” which indicates the length of the audio payload in the frame and (ii) the audio payload. Note that the size of the audio payload is variable.        There are no synchronization symbols indicating the start of each individual AMR frame.        
These properties lead to the following problems for playback applications:                The AMR file has to be played sequentially from start to end. There are no random access points (e.g., synchronization symbols) in the recorded audio file. This prevents the user from starting the audio playback from any arbitrary time instant (e.g., time proportional to a fraction of file size).        It is not possible to easily fast forward or rewind through the audio file.        
To summarize, given an arbitrary starting point in the file, it is impossible to decode the file correctly without performing sequential decoding starting from the first frame in the file.
As a result of the foregoing problems, many 3G phone manufacturers are forced to disable useful features such as playback starting from an arbitrary point as well as fast forward/rewind of audio.