The present application relates to capture and processing of digital audio signals, and more particularly to adjusting clarity of digital audio signals during and after signal capture.
Note that the points discussed below may reflect the hindsight gained from the disclosed inventive scope, and are not necessarily admitted to be prior art.
Generally, audio recordings, audio amplifiers, and audio signal processing, whether of high or low resolution, do not accurately reflect the original dynamic range, transient response, and frequency balance of the sound(s) they were intended to capture, stream and/or amplify. This is because components in the audio capture/processing chain generally are not “ideal” examples of their class of component (much as, contrary to “ideal” assumptions in certain physics calculations, air is not frictionless). As a result, recorded or captured/streamed audio contains unintended low level distortion byproducts. The audio industry is aware of this situation, and uses equipment, recording techniques, and audio signal processing steps to compensate. Common wisdom has held that these steps have been successful in compensating for known sources of distortion, especially with respect to high resolution audio. However, some listeners can hear consistent remaining distortion in typical recording products. (Some listeners cannot hear this distortion. Some listeners can hear this distortion and find it unpleasant, while others find it pleasant or are not bothered by it.)
Some listeners are aware of remaining distortion-related shortcomings in captured and computer-generated music, while other listeners do not hear such degradation. This is because different people possess different levels of ability to hear “spectral dynamic range”-related distortion, an unwanted by-product of the typical audio recording/capture chain. “Spectral dynamic range” is defined and further described hereinbelow.
The audio recording industry recognizes that the various elements of the audio recording and/or signal processing chain have limitations, and has attempted to compensate for the recognized shortcomings in (for example) audio amplifier circuits, audio digital to analog converters and analog to digital converters, computer processing, and playback equipment. To do so, the audio industry has employed various techniques.
To correct distortion and make other desired changes to an audio program material, audio engineers typically utilize, for example:
a) Multiple microphone recording techniques, in which each instrument's sound is captured with one or more microphones dedicated to that instrument. This means the microphones used for each instrument pick up the sounds of the desired instrument, as well as lower level (e.g., lower amplitude and distorted) versions of sounds from the other instruments and/or voices in the ensemble (which have their own microphone(s)). The lower level sounds of instruments not corresponding to a particular microphone are generally undesired, and add low level distortion to the recorded (or otherwise captured) sounds.
b) Various types of audio signal processing approaches, implemented using hardware or software. Some audio signal processing techniques used by audio engineers to compensate for noise and/or distortion in audio program material (a recorded, computer-generated, or otherwise captured set of audio data) use “tune control”. Tune control raises or lowers the amplitude of audio program material over a frequency range, and can be readily described using sound spectrographs.
FIG. 1A shows an example of a sound spectrogram 100 of an audio program material. A sound spectrogram is a visual representation of a spectrum of audio frequencies as they vary with time, and corresponds to a time history of many spectrographs. A sound spectrogram can be created from a digital audio program material using a transformation from a time domain to a frequency domain, such as a Fourier transform. For simplicity and clarity, the spectrogram of FIG. 1 shows only some of the frequency information of the corresponding signal. FIGS. 4A, 4B, 5A, 5B, 6A, and 6B are examples of sound spectrograms of live-recorded audio program material.
FIG. 1B shows an example of a sound spectrograph 102 of one moment in time of the audio program material of FIG. 1A. A sound spectrograph 102 shows amplitude information for a range of audio frequencies at a moment in time. The identified data points at particular frequencies (different shapes) in FIG. 1B correspond to the identified data points at the same frequencies (corresponding shapes) in FIG. 1A.
FIG. 2A shows an example of a sound spectrograph 200 of an audio program material. A particular frequency range 202, 750 Hz to 1300 Hz, is selected in FIG. 2A for modification using tone control. The selected frequency range 202 has an amplitude of 10 dB from peak 204 to trough 206 (the peak 204 has an amplitude of about −52 dB, and the troughs 206 have an amplitude of about −62 dB).
FIG. 2B shows a prior art example of a sound spectrograph 208 of the audio program material of FIG. 2A, after application of tone control. In FIG. 2B, tone control has been used to raise the amplitude of the selected frequency range 202 (750 Hz to 1300 Hz) by 10 dB, while preserving the 10 dB peak-to-trough amplitude range. (However, note that the peak-to-trough frequency range for adjacent peaks 204 is decreased by tone control.)
Typical audio signal processing techniques include:
1) Audio compression, also called dynamic range limiting, uses tone control to artificially reduce the amplitude difference between the lowest-amplitude sounds and the highest-amplitude sounds. This is generally done in an attempt to maximize perceived loudness. Typically, use of dynamic range limiting is related to the belief (generally erroneous) that compressing the audio dynamic range (the difference between a frequency range's maximum amplitude and its noise floor) to a minimum, then boosting the level of all of the audio program material (that is, increasing the amplitude of all sounds in the audio program material), will result in the most perceived loudness. This does not take into account the distinction between level and volume (level corresponds to measurable/objective amplitude, in contrast to volume, which corresponds to perceived amplitude).
2) Frequency equalizers use tone control to boost or reduce audio levels in selected portions of the frequency range of an audio signal. This is used, for example, to achieve a desired “sound” (a characteristic tonal balance of the audio signal) of the resultant audio that may be purposely different than the original (i.e., deliberately distinct from the natural, as-produced sound prior to capture), or to increase vocal presence by boosting voice content over music by emphasizing a frequency range containing significant human vocal content.
3) Transient “restorers”, which are generally used to compensate for the fact that compression and equalization tend to significantly reduce the original transient response, which reduces the perceived “naturalness” of the resultant sound. However, use of artificial means to emulate the original transient attacks tends to further alter and degrade the original frequency, amplitude and phase relationships within the content of the audio signal.
4) Pre-adjustment of the sound, which is generally used to adjust some aspects of the frequency response to compensate ahead of time for playback on less than optimal quality reproduction equipment (e.g., speakers, headphones, computer and/or cell phone speakers, and earbuds). Whether or not a listener will consider the result an improvement in audio quality may depend on, for example, whether they are listening using equipment included in a sound adjuster's “target audience.” That is, optimal pre-adjustment is generally different for different sound reproduction (playback) equipment. (There is an assumption herein that “optimal” adjustment of a sound recording is typically—though not always, depending, in part, on listener preferences and an intended character of the sound recording after processing, on playback—an adjustment that most closely reproduces the sound (the series of pressure waves) which was recorded.)
5) “Spectral repair” is used to remove sounds in a selected time window of audio program material. Spectral repair uses an interpolation of selected areas on a spectrogram (typically, including sounds from before and after the sounds to be removed) to fill in, or fully or partially replace, sounds to be removed. This is similar to using an image processing tool to cover a blemish in an image with a pattern derived from a similar portion of the image. FIG. 3A shows an example of a sound spectrogram 300 of an audio program material. The sound spectrogram 300 includes an undesired transient sound 302 (the sudden, broad-spectrum energy peaks within the box), such as the sound of a dropped book. FIG. 3B shows an example of a prior art sound spectrogram 304 of the audio program material of FIG. 3A on which spectral repair has been performed. (FIGS. 3A and 3B are greyscale images generated from color spectrograms, in which color indicated amplitude.) To perform the spectral repair, sounds near the blemished region are used to interpolate predicted sounds for the blemished region. The interpolated sounds are then substituted for the undesired transient sound 302. In FIG. 3B, the amplitude of the undesired transient sound 302 has been reduced to (or near) zero. However, comparison of FIGS. 3A and 3B shows that some sounds which appear to have been part of the as-performed audio program material (for example, item 306) have reduced amplitude after spectral repair. That is, spectral repair can (and generally, will) result in a loss of fidelity.
The net result of using some or all of the above-described audio processing techniques is typically music or audio with significantly diminished resemblance to the original audio, and which is lacking in dynamic range, transient response, and natural sound quality. As a result, processed sound tends not to realistically emulate the original audio material.
The inventor endeavors to disclose new and advantageous approaches to adjusting clarity of captured, streaming and other digitized audio signals.