1. Field of the invention
The invention relates generally to the field of data extraction and encoding, and more particularly to an improved system and method of extracting digital audio data from a medium to one or more playable files.
2. Background of the Invention
Digital Audio Extraction (“DAE”), also known generally as “ripping,” is the process of copying a track from an audio disc, usually music, to a hard drive or other storage medium by creating a file (or group of files) in any number of encoded and/or compressed formats (e.g., WAV, MP3 . . . etc). A wide variety of software packages that utilize DAE are now available, and the average computer user can easily “rip” any number of tracks from a CD collection to one or more files on a computer hard drive. Subsequently, these tracks can be played back with software designed to read and play extracted audio files.
Although ripping has become a common practice for many computer users, high quality audio extraction can be difficult because of the complexities inherent in the way data are stored on audio discs. Audio CD data are organized into sectors in order to ensure a constant read rate. Each sector consists of 2,352 bytes of sound data along with synchronization, error correction, and control/display bits. These sectors are further broken down into sound samples. Each sector contains 588 samples of sound for each of two stereo channels, and each sample contains two bytes (16 bits) of sound data. The standard sampling rate of CD players is 44,100 samples per second.
Sectors are not arranged in distinct physical units. Instead, the data in one sector are interleaved with data in other sectors so that a defect in the disc will not destroy a single sector beyond correction. In addition, each track's location, or address, is recorded in the disc's Table of Contents (“TOC”), which is stored in the “lead in” area of every disc. Accordingly, an audio disc's TOC, much like a book's, is a good resource for determining where tracks begin and end. The TOC indicates the minute, second, and sector (to 1/75th of a second) at which each track begins.
Extraction of audio/video content from a compact disk to a hard disk using current DAE software can be a difficult task. Every byte of a 2,352-byte sector of audio data is used strictly for audio. Essentially, no header exists; there is no information in the sector that allows for the exact positioning of a read head over a specific sector. To address an audio sector, a CD-ROM drive uses the TOC data to approximate how far out along the CD it must scan in order to find the beginning of a specified track. Drives typically reach an audio address that is within ± four sector addresses of the address being sought (± 4/75th of a second in playback time), and a read request may return any one of the nine sectors. This inexact positioning may cause undesired clicks and pops, commonly referred to as “jitter,” in extracted audio files.
Graph 110 of FIG. 1 is a plot (not to scale) of the audio level (e.g., audio volume, audio intensity, audio amplitude . . . etc) of an audio recording 120 over time (horizontal axis). Track divisions 130 represent where tracks (e.g., songs, of audio recording 120) begin and end. Threshold 140 represents a predetermined level threshold and lines 150 represent the points at which the sound level of audio recording 120 drops above or below threshold 140. For example, the level of audio recording 120 is below threshold 140 during time lapses 152 and 154. Time lapses 152 and 154 represent the dead silence that may exist at the beginnings and ends of songs on a CD, respectively. Lines 160 represent the points at which the sound level of audio recording 120 significantly drops but does not drop below threshold 140. For example, the level of audio recording 120 is dropped significantly during time lapse 164. Time lapse 164 represent a lull in the level of audio recording 120 that may occur between tracks, such as clapping in between songs of a live album. Finally, it is shown that a lull in the level of audio recording 120 does not exist in between tracks 3 and 4. This is an example of two tracks that blend into each other during playback without any lull in sound level.
Current DAE software can be used to extract audio recording 120, and FIG. 2 shows how current DAE programs function. A current DAE program will extract each track of audio recording 120 separately and create a pulse-code-modulation (PCM) file for each track (PCM files 201–204). These PCM files can eventually be converted to encoded file formats (encoded files 211–214) that may be read for playback of audio recording 120. These encoded file formats may be uncompressed or compressed (e.g., via MP3 or WAV file formats).
One disadvantage to current extraction techniques is that the software extracts each track from the source CD separately. First the software will read the CD TOC to determine the locations of the tracks to be extracted. Then each track will be extracted from a beginning point that may or may not be where the track actually starts and will end extraction at a point that may or may not be where the track actually ends. Again, the read head's accuracy in finding sector addresses is low, and it can only approximately find the start of a track. Given these uncertainties, one or more sectors of a track may be lost during extraction, or one or more sectors may be unintentionally added. For example, FIG. 3 illustrates some of the drawbacks of using DAE techniques currently known in the art. Audio track 310, beginning at time 302 and ending at time 304, can be extracted from an audio CD. The resulting PCM (pulse-code modulation) data file may contain missing sectors (e.g., PCM file 320), extra sectors (e.g., PCM file 330), or both missing and extra sectors (e.g., PCM file 340) at either or both ends of the file. This problem is further exacerbated when a file contains extra sectors that overlap with sectors contained in a consecutive track (e.g., overlapping sector 360 of PCM file 350). Overlapping sectors will cause increased jitter when extracted tracks are played back in their original order. Jitter is particularly noticeable between extracted tracks that are intended to blend into each other during playback (e.g., a segway, a house mix, or a live recording). These types of recordings may not have the same dead silence between tracks that typical multi-track recordings have.
The problems described above are caused because current DAE programs do not analyze the bridges between tracks to determine if there exists dead silence or just a lull in the sound, as in a live recording. Instead, current programs simply add a small amount of silence between extracted tracks during playback even though that silence may be undesirable for certain track sets. Finally, if there is some noticeable sound between tracks, there is a clear loss of sound quality during playback because current DAE techniques cannot adequately compensate for jitter.