The present invention relates to an apparatus and concomitant method for converting an analog audio source into a digital format. Specifically, the present invention provides a method for converting and improving an existing analog audio recording as it is converted into a digital format. In performing the conversion function, the present invention is also capable of identifying and maintaining various index information stored in the original analog audio signal, e.g., the identification of various index tones.
As digital technologies continue to gain wide acceptance, a vast amount of previously stored information must be adapted into the new digital standards. Such previously stored information includes a vast library of existing analog-recorded books. To preserve the huge investment in such analog recordings, these recordings are being converted into digital format for implementation such as the Digital Talking Book (DTB) in accordance with the xe2x80x9cDaisyxe2x80x9d consortium specifications.
The electronic book is already available in various forms to the mainstream consumer. However, none of the current versions of widely available electronic books are accessible to the print disabled. The print disabled community includes blind as well as learning disabled readers. Previously, this population has been afforded the option of the analog talking book. The analog talking book generally includes audio with embedded index tones indicating pages and chapters. The user must listen to the audio at fast forward to hear index tones and use the tones to navigate linearly to the desired section of the recorded book. Most recently, there have been efforts to produce digital books, which afford users the opportunity to navigate non-linearly through the recorded material.
There is currently an effort underway to convert large libraries of valuable analog talking books to new digital talking book formats. Since the digital talking books represent a revolutionary advantage for the print disabled community, there is a great deal of urgency attached to these conversion efforts. To satisfy this urgent need, an analog to digital talking book conversion system is required.
One particularly problematic aspect of the conversion has been automatic and accurate index tone detection and interpretation. Talking books have, especially in the United States, been largely produced by volunteer organizations. As such, these books display a wide range of non-uniform characteristics. This lack of consistency makes the problem of building an automated conversion system a difficult one, since the system must accommodate a wide range of inputs. A number of different technologies, including phrase detection methods, have been tested and proven unreliable and unable to accommodate the wide range of inputs when used to convert some formats of analog talking book. Efforts to convert books manually have also proven difficult. While brute force methods can be utilized to accomplish the desired complex tasks, such solutions are often too costly to be practical.
Therefore, there is a need for an apparatus and method for converting and improving an existing analog audio recording as it is converted into a digital format with reliable tone detection and interpretation.
One embodiment of the present invention provides an apparatus and method for converting and improving an existing analog audio recording as it is converted into a digital format with reliable tone detection and interpretation. Specifically, the digitized audio undergoes three tone detection processing steps: silence detection and removal, anomaly detection and removal and page/chapter identification.
Finally, the audio undergoes an audio optimization processing, where the audio is adjusted from 2xc3x97 speed playback to standard speed playback by performing pitch correction on the audio files. Pitch correcting the audio by a factor of xe2x88x9211 semitones slows the audio for standard speed playback. Audio is optimized for conversion to audio compression formats by removing low frequencies below 60 Hz, boosting the high bass (120-200 Hz) and upper midrange frequencies (2-6 kHz). The files are then normalized and saved so that all audio has equal amplitude.