Not Applicable
This invention relates to improvements in a method and system for the storing and playback of audio data stored as track information on a storage medium.
A music compact disc (CD) commonly has multiple tracks composed of musical data. Each of these tracks are separated by a pause or period of silence that indicates to the listener that a new track is about to start. However, some CDs do not use this convention. For example, a live recording or classical CD might not have any pauses but it also, like an ordinary CD, has multiple tracks; this type of CD is considered for purposes of this disclosure a NO-Pause CD. In this case, the tracks are played as one continuous audio stream. Ordinarily, a listener does not even notice the transition from one track to the next unless he or she is watching the track number displayed on the player.
Audio Data Retrieval Across a Network (100)
However, in certain instances, a listener notices the transition from one transition to another. In particular, a problem arises when CD music is purchased across a network or sent to be duplicated at a reproduction facility. FIG. 1 illustrates a prior art system (100) comprising a network overview for audio data retrieval from an optical disk storage rack. This system comprises: an optical storage rack (102) comprising multiple CDs, a network file server (104), a network (106) and workstations (108a to 108n). Data is extracted from each track of a particular CD from the optical rack (102), and is compressed by the network file server (104) or a dedicated extraction-compression node not shown in FIG. 1. This data is then transmitted over the network (106), and received at the workstations (108a to 108n) where the individual tracks are decompressed and recombined. Finally, the tracks are burnt onto a CD or stored on a hard disk. For a common CD with pauses between tracks, this scenario works well. However, for a NO-Pause CD, this algorithm produces an unpleasant side effect which is annoying to the listener. This occurs when a No-Pause CD is transmitted across a network for reproduction at another location it is first split into multiple tracks; then each track is compressed individually and later decompressed and put back together at the receiving end. As a result of the compression decompression process and of the transmission, the seams where the tracks are rejoined always have some distortion and the connecting points in the audio wave form are mismatched, i.e. discontinued. Thus, a high frequency spike in the wave form is created which is heard by a listener as a clicking sound.
A prior art solution to this problem is to store an entire CD as one audio stream. However, when dealing with individual track audio streams, there is greater flexibility in allowing individual songs to be downloaded as well as greater management facility in controlling a digital music library of songs, play-lists, and albums. Therefore, this prior art solution does not facilitate an effective playback apparatus.
Encoding Decoding Standard (200 and 300)
Further, a typical prior art protocol and system for encoding and decoding of compact disc information is illustrated with regards to FIGS. 2 and 3. This system and protocol are highlighted in U.S. Pat. No. 5,809,474, which is hereby incorporated by reference. This international audio coding standard, i.e., IS 11172-3 proposed in the Moving Picture Experts Group (MPEG), provides a high-quality audio playback signal for use in, for example, a compact disk (CD), at 128 Kbps per audio channel. This international standard can be used to store an audio signal in a digital storage medium such as a CD, a digital audio tape (DAT), or a hard disk, and may reconstruct an audio signal by connecting the storage medium to a decoder directly or through other means such as a communication line. Further, a bit stream encoded by an encoder may be directly reconstructed to an audio signal in a decoder through a communication line.
While implementing such an encoder and decoder in a system, analysis and synthesis filtering algorithms perform the most computations in the whole system. In particular, in an audio decoder, most of the time is consumed in a band synthesis filtering algorithm. Hence, the issue of how to efficiently realize the analysis and synthesis filtering algorithms is closely related to efficiently implementing the audio encoder and decoder.
That is, realization of the audio encoder and decoder in exclusive-use hardware by efficient implementation of the analysis and synthesis filtering algorithms reduces time required for encoding and decoding. Thus, the encoder and decoder may be realized using a slower and cheaper processor available. Further, due to increasing use of multimedia devices along with development of computers, communications, and broadcasting, there is an increase in the need for reconstructing an audio signal by decoding an encoded bit stream using software, rather than exclusive-use hardware. Though improvement of the performance of the multimedia devices increases the probability of real-time processing in a general-purpose processor of high performance, a fast algorithm enables operations of real-time processing software in more general-purpose processors.
Encoding Unit (200)
FIG. 2 is a block diagram (200) of an audio encoder adopting an MPEG audio standard IS 11172-3 encoder. The audio encoder of FIG. 2 has a mapper (202) for analysis-windowing and time/frequency mapping an input signal, a psychoacoustic model (208) for assigning bits to each band by using psychoacoustic characteristics, a quantizer/encoder (204) for quantizing and encoding the mapped signal according to the number of bits assigned to a band, and a frame packer (206) for generating a bit stream. The encoded bit stream is stored onto an optical disk (210) by a laser servo (212) or sent across a network.
The mapper (202) classifies an input audio bit stream according to a frequency band using an analysis window. Time/frequency mapped samples are called sub-band samples in layer I or II of MPEG, or transformed sub-band samples in layer III. The classification of the signal according to a band contributes to alleviate distribution of noise caused by quantization across the entire bands, when the signals are reconstructed.
The psycho-acoustic model (208) models the procedure of human perception of sound, using especially a masking phenomenon and a critical band among psycho-acoustic characteristics. The psycho-acoustic model (208) produces a data set for controlling quantization and encoding.
The quantizer/encoder (204) performs quantization and encoding to prevent errors involved in signal reconstruction from being perceived by a human being, using the result of computations in the psycho-acoustic model.
The frame packer (206) efficiently combines quantized data with information needed for decoding, and produces a bit stream by the Huffman coding method.
Decoding Unit (300)
FIG. 3 is a block diagram (300) of an audio decoder adopting a high-speed band synthesis filtering algorithm, here, an MPEG audio standard IS 11172-3 decoder. The decoder of FIG. 3 has a frame un-packer (302) for unpacking a signal from an input bit stream or from a storage device (308) using a laser servo (310), a decoder/inverse-quantizer (304) for decoding and inverse-quantizing the quantized signal, and an inverse-mapper (306) for time/frequency inverse-mapping and synthesis-windowing the inverse-quantized signal.
The frame un-packer (302) separates quantized audio data and other additional information to be decoded from an encoded bit stream.
The decoder/inverse-quantizer (304) reconstructs the quantized audio data to the values prior to quantization using the quantization step-size.
The inverse-mapper (306) converts frequency-domain data to time-domain data. The time-domain sample values are synthesis-windowed and converted to the time-domain signal by overlap-and-add (OLA).
Shortcomings with Compression
As previously described, distortions and waveform mismatches in a compact disc playback process originate from an audio compression process. The compression program uses the past, present and future audio data to generate compressed audio. When a track is compressed at the beginning of the track, the compression program assumes silence for the past data; the present and future audio data is available from the current track. At the end of the current track, the compression program assumes silence for future audio data; the past and present data are available from the track. These assumptions are valid if the track actually has silence at the beginning and at the end of the tracks as is true for tracks in most CDs.
However, for a No-Pause CD, this is not true since at the beginning of any given track except for the first track on the No-Pause CD, the past data is not silence; rather, the No-Pause CD has audio information which is at the end of track a previous track. Further, at the end of the current track, the future audio data is not silence, rather it is the beginning of the next track. Therefore, the compression program""s assumptions for a regular CD are inapplicable and incorrect for a No-Pause CD. Because of these incorrect assumptions, the compressed/decompressed audio track becomes noticeably distorted upon playback at the beginning and at the end of the tracks.
Finally, what is needed is a solution to the distortion and the mismatched connecting points of the audio wave that are formed as a result of a continuous audio signal being broken down into individual tracks and later being recombined.
Moreover, a need exists for a method and apparatus to remove the high frequency spike in the wave form that is created and heard by a listener as a clicking sound must be avoided.
Distortions and waveform mismatches in a compact disc playback process originate from an audio compression process. A compression program uses the past, present and future audio data to generate compressed audio. When a track is compressed at the beginning of the track, the compression program assumes silence for the past data; the present and future audio data is available from the current track. At the end of the current track, the compression program assumes silence for future audio data; the past and present data are available from the track. These assumptions are valid if the track actually has silence at the beginning and at the end of the tracks as is true for tracks in most CDs. However, for a No-Pause CD, this is not true since at the beginning of any given track except for the first track on the No-Pause CD, the past data is not silence; rather, the No-Pause CD has audio information which is at the end of track the previous track. Further, at the end of the current track, the future audio data is not silence, rather it is the beginning of the next track. Therefore, the compression program""s assumptions for a regular CD are inapplicable and incorrect for a no-pause CD. Because of these incorrect assumptions, the compressed/decompressed audio track becomes noticeably distorted upon playback at the beginning and at the end of the tracks. The solution for this problem is to append overlapping boundary data to the beginning and end of each track. By doing this, the ending data from the previous track and the beginning data from a succeeding track are available for the compression process. The compression program then manipulates the additional appended data in order to generate the compressed audio. Later, by severing the overlapping appended additional information before recombining the tracks, the resulting NO-Pause CD audio stream is free from distortion and mismatch.