1. Field of the Invention
Some audio processing systems make use of amplitude or frequency adjustment or both, others rely on optimizing the Group Delays of frequencies; however, the current invention described herein converts audio information (changes in amplitude and frequency) into a phase space, where changes in amplitude and frequency become phase shifts of portions of a digital pulse. Regular loudspeakers can automatically decode the phase information, resulting in a virtually distortion-free analog acoustic signal whose bandwidth is approximately 0-50 KHz and whose characteristics include not only changes in amplitude and frequency, but, also, changes in phase as a function of frequency. It is obvious from the scientific literature (see references) that no coherent theory of hearing currently exists. Low frequency acoustic signals' wavelengths are much too large to enter the ear canal, and high frequencies (above 6 KHz) cause bifrication of neuronal firing in the brain, allowing for subconscious processing of frequencies considered outside the normal (20-20 KHz) hearing range.
The current invention expands the bandwidth of the audio signal from the "normal" frequency range to about 0-50 KHz, separates frequencies as a function of time, and converts ordinary stereo signals into phase distributed monaural. The result is a signal that interacts with the human brain in a new way to produce an effect that resembles a live performance, restoring what was lost during the transduction and recording sequences. It was discovered in 1932 that hearing was not strictly physical, that psychological factors also contributed to our perception of sound (Sivian and White). That phase angles of even pure tones are sensed by humans was established in 1956 (Zwislocki and Feldman).
The ear is a non-linear device--current models and theories of hearing are based, primarily, on old, outdated linear models and concepts (Moore and Patterson). Some musical harmonics are non-linear (violin overtones, eg.) (Meinel). The interaction of acoustic signals from various musical instruments (including Human voices and electronic synthesizers) create interference patterns (Hall) that are embedded in the recording medium (tape, eg.), but whose characteristics are ignored with current transducers and recording and playback equipment. Just as laser processing of images focused on photographic emulsions can bring out the three-dimensional information in the two-dimensional image by retrieving phase information in the interference patterns, so this invention restores 3D information lost during transduction and recording. The result is a restoration of the "live" performance.
Classical theory indicates that acoustic events can be described in at least two ways; in a time domain or a frequency domain, each convertible into the other via a Fourier transformation. The mathematical formulation for this process is well known. The time-domain characterization of an acoustical event is a scalar, while the frequency-domain representation is a complex vector quantity containing amplitude and phase information. The time domain representation can also be expressed as a complex quantity. The scalar portion of the time domain vector represents performance based on impulse excitation; the imaginary part of the vector is the Hilbert transform of the scalar.
Loudspeakers and electrical networks which transfer energy from one form to another can be characterized by response to an impulse function, because the impulse response can be manipulated to predict the behavior of the system in response to any arbitrary signal. Fourier transforms work for predictive systems as well as causal systems. However, the group velocity of a set of audio signals is not related to time delay for all possible systems, and uniform group delay does not insure a distortionless system.
Group Delay is derived from phase delay, which is defined as the phase shift in a system at a given frequency. Group delay is associated with a group of frequencies around a central carrier, such as those encountered in modulated communications, but it also finds some relevance in describing how a system responds to a change in frequency. Group Delay follows an inverse square law. The value is fixed for DC but it approaches a finite value (near zero) at infinity. For a given function, and given the appropriate values for the resistor and capacitor, this logarithmic response will appear across the audio range. Starting with T.sub.gd =2.alpha..sub.0 /.alpha..sub.0.sup.2 +.omega..sup.2, it can be shown that: EQU T.sub.gd (.omega..apprxeq..alpha..sub.0).apprxeq.2.3/.alpha..sub.0 log (.alpha..sub.0 /.omega.).
For a simple case it is possible to relate a logarithmic approximation to the group delay. The approximation was developed around a region where alpha equals omega. A more general equation for a larger region is presented below. It was derived using similar techniques but spans the area from omega equals alpha to omega "large" (50K radians or so). Small values of omega are not permissible, and the error at omega equals alpha is significant. These logarithmic equations are not specifically necessary for the design process but when the user works with practical circuits, it will be noted (on test equipment) that the Group Delay of the audio bandwidth changes logarithmically with frequency. The following equation can be used to validate the observations; however, it is noted that because of the foregoing, Group Delay is rather meaningless and phase shift more accurately describes the true action of the circuit. Group Delay is included here to provide an alternate way of analyzing the circuit's action.
These two equations are generally equivalent for .omega.&gt;5.alpha..sub.0 : EQU T.sub.gd (.omega.)=2.alpha..sub.0 /.alpha..sub.0.sup.2 +.omega..sup.2 EQU T.sub.gd (.omega.)=2/.alpha..sub.0 .multidot.ln[1+(.alpha..sub.0 /.omega.).sup.2 ].
The same equation rewritten for standard logarithms:
T.sub.gd (.omega.).apprxeq.4.6/.alpha..sub.0 .multidot.log [1+(.alpha..sub.0 /.omega.).sup.2 ]
Interaural time difference, necessary for determining the position of a source, also has bearing on pitch. Further, most of the critical interaural time differences are in the range of plus or minus 1 millisecond. Thus, when the group delay is modified, so is the perception of the sound.
A generalized version of the All-pass response Group Delay is presented below. This equation can be used, with resonable accuracy, to predict the group delay of a specific frequency for various RC combinations. It also accounts for Gain adjustments. Using these equations, one can tailor the Group Delay response.
Referring to FIG. 5:
ti .alpha..sub.0 =1/R.sub.1 C and A=R.sub.3 /R.sub.2
The general transfer function is: T(s)=-As-.alpha..sub.0 /s+.alpha..sub.0
which means the gain of the circuit is: EQU .vertline.T(s).vertline.=-A.
The phase response is: EQU .phi.(.omega.)=-2 tan.sup.-1 (.omega..sqroot.A/.alpha..sub.0)
and the Group Delay is given by: EQU T.sub.gd (.omega.)=(A+1).alpha..sub.0 /.alpha..sub.0.sup.2 +.omega..sup.2 A+A.multidot.50 ns+100 ns
The second and third terms are included because their exclusion yields increasingly poor results with increasing frequencies. The above equations may be interpreted in the following physical sense: alpha determines the range over which the group delay responds logarithmically. An increase in alpha will tend to shift the range to higher frequencies, but will reduce the group delay itself, i.e., the actual delay times will decrease. A decrease in Gain, A, can be used to offset this decrease in delay. Conversely, for a given alpha, adjusting Gain can be used to set the delay time at the expense of the overall frequency range. Increasing Gain increases the maximum delay presented by the system (at very low omega), but the delay of a specific frequency compared to unity gain will be smaller due to the shift in frequency range; adjusting alpha can be used to compensate.
The circuits shown in FIGS. 3 & 5 all utilize an alpha of about 100 radians each. Increasing alpha will tend to emphasize lower frequencies, and decreasing alpha will tend to emphasize higher frequencies. In any case, a specifically desired response will require the adjustment of both Gain and alpha.
FIG. 3 shows the cascaded implementation. The effect of the cascade is a linear addition of the delays. The general effect of cascading is to delay a broader range of frequencies by a greater amount, thus enhancing the effect.
Because the time and frequency domains are two ways of describing the same event, accurate time domain representation cannot be obtained from limited frequency domain information. For example, the time delay of a frequency component passing through a system with nonuniform response cannot be determined with accuracy. However, a joint time-frequency characterization can be made using first and second order all-pass networks. This is consistent with ordinary human experience. At any frequency there are multiple arrivals of the audio signal at the listener's location as a function of time.
The individual time-frequency components of an audio signal, predicted mathematically, overlap in the time and frequency domains. Therefore, a graphical presentation is not possible, because it is impossible to separate simultaneous arrival times in a single time domain plot.
Potential energy (i.e., pressure expressed in dB) and comparisons of input to output signals directly (a measure of distortion) do not completely describe the performance of audio equipment quality such as loudspeakers, microphone, and electrical networks. Total sound energy provides phase distortion information and, although phase is not detectable consciously for simple signals, there are indications that the human hearing mechanism is capable of processing complex functions and perceiving phase information as part of total sound perception.
The square root of the total energy density vector, E, is equal to the sum of the square root of the potential energy vector and the imaginary component of the square root of the kinetic energy vector: ##EQU1##
Attempts to measure the total energy density at a microphone responding to a remote sound source will only yield part of the total energy density of the source. Thus, at any given moment, a microphone will not directly measure E. Essentially, a microphone compresses complex spatial, multi-dimensional acoustic signals into a single point in time and space, effectively making the signal two-dimensional as a function of time. However, the information necessary to unravel the entire original signal is contained in the compressed signal and can be retrieved if processed properly.
Although the threshold of hearing has been established in terms of vector orientation and frequency of pure tones (see, e.g., L. Sivian and S. White, "on Minimum Audible Sound Fields," J. Acoust. Soc. Am., Vol. 4, pp. 288-321 (1933)), pure tones have no Fourier transforms. The human hearing mechanism processes total energy density, not just the "minimum audible pressure" associated with a pure audio tone.
The ability to localize direction and distance from a sound source has something to do with the orientation of the ear with respect to the vector components of sound. For pure tones, simply the phase differences between arrival of the signal at the two ears provides a clue to the direction of the source. See Kinsler and Frey, Fundamentals of Acoustics (New York: John Wiley and Sons, 1950), pp. 370-392. Thus, the minimum audible field for binaural hearing varies with amplitude, frequency, and azimuth relative to the source signal.
J. Zwislocki and R. Feldman (1956) "Just Noticeable Differences in Dichotic Phase", J. Acoust. Soc. Am., Vol. 28, No. 5, p. 860 (September 1956) pointed out that the ears may not be able to detect phase or time differences above 1300 Hertz and the only directional clues above 1300 Hz are contained in relative intensity differences at the ears.
In reality, the human auditory system binaurally localizes sounds in complex, spherical, three dimensional space using two sensors (ears) that are unlike microphones, a computer (brain) that is unlike any computer constructed by man, and, at a live performance, the eyes. The eyes allow us to "hear" direction by providing a sensory adjunct to the ears for localization of sound in azimuth, distance and height. During reconstruction of a familiar sound, such as a symphony orchestra, the brain remembers instrument placement and correlates this information with auditory clues to provide a more complete sense of the individual orchestra sections and sometimes of the locations of individual instruments. Techniques for localizing sound direction by the ears, neural pathways, and the brain have been termed "psychoacoustics".
In addition to direction, the brain will interpret distance as a function of intensity and time of arrival differences. These clues can be provided by reflected sound in a closed environment such as a concert hall, or by other means for sound originating in environments where no reflections occur, such as in a large open field. In a closed environment, there is a damping effect as a function of frequency due to reverberations. When acoustic energy is reflected from a surface, a portion of the energy is lost in the form of heat. Low frequencies tend to lose less energy and are transmitted more readily, whereas high frequencies tend to be absorbed more quickly. This makes the decay time of high frequencies shorter than that of low frequencies. The air itself absorbs all frequencies, with greater absorption occurring at high frequencies.
In "Biophysical Basis of Sound Communication" by A. Michelsen (in B. Lewis (ed.), Bioacoustics, A Comparative Approach (London: Academic Press, 1983)), at pages 21-22, the absorption of sound in air is described as a combination of dissapation due to heat and other factors not well understood. In air, the absorbtion coefficient in dB/100 meters is 1 at about 2 KHz. At about 9 KHz, the signal is down by 10 dB; at 20 KHz it is down by 100 dB; and at 100 KHz (the upper harmonics of a cymbal crash), it is down by about 1000 dB. Thus, higher harmonics generated by musical instruments are drastically attenuated (in a logarithmic fashion) by even a distance of a few feet when traveling to microphones, and then even more when traveling from speakers to the listener's ears.
With conventional stereophonic sound reproduction systems, it is necessary to be equidistant from the speakers in order to experience the proper stereo effect. With earphones, standard stereo provides a strange ping-pong effect coupled with an elevated "center stage" in the middle and slightly above the head. At best, ordinary stereo is an attempt to spread sound out for increased realism, but it is still basically two-dimensional.
In the 1920s Sir Oliver Lodge tested human hearing range out to 100 KHz. It has been suggested that the true range of human hearing is not completely known. However, the outer ear, inner ear (cochlea), auditory nerve, and human brain are capable of detecting, routing, and processing frequencies in excess of 100 KHz, and possibly to 300 KHz and beyond. However, conscious hearing is limited by the brain to roughly 20 Hz to 20 KHz.
There is no currently accepted theory of how humans actually hear outside the voice range of acoustic signals. Below about 200 Hz, the wavelength of an acoustic pressure wave is too large to enter the ear canal. Experience with low frequency standing waves suggests an interaction with the cochlea or auditory nerve directly. Indeed, standing wave acoustic emitters produce the perception of distortion-free sound throughout the hearing range. Above about 6 Hz, the "volley" theory and active cochlear processes could account for an increase in hearing range beyond 20 KHz. The volley theory is derived from the fact that there is not a single stimulus-response event per nerve; rather, higher frequency stimulation results in a multiplicity of neural firings. The process is one of bifurcation wherein the higher frequencies cause a greater number of neurons to fire. This suggests the possibility of fractal pattern generation. How the brain interprets the volley of information presented to it is unknown, however.
In Auditory Function, edited by G. Edleman, W. Gall, and W. Cowan, (New York: John Wiley & Sons, 1986), a class of experiments is described which demonstrate acoustic emissions from animal and human ears. The cochlea can function as a generator of acoustic signals which can combine with incoming signals to produce higher frequencies. Both empirical and theoretical studies (indicating that active cochlea processes are necessary for basilar membrane tuning properties) support the concept.
P. Zurek, in "Acoustic Emissions from the Ear--A Summary of Results from Humans and Animals", J. Acoust. Soc. Am., Vol. 78, No. 1, pp. 340-344 (July 1985), indicates that frequency selectivity results from active cochlear processes. When the ear is presented with a non-linear pulse, in addition to the stimulus response mechanism, another response with an 8 millisecond (or longer) delay is produced. This phase-shifted signal, generated by the ear, may play a role in the actual way in which we hear music and other high frequency sounds. When musical instruments produce sound, the various Fourier waveforms are not simply produced independently of each other, but exist in a phase space wherein there are phase interactions among all of the sounds. Even a single string plucked on a harp or struck on a piano will produce phase-related signals and harmonics, not simply frequencies and amplitudes. Thus, the ear must be capable of decoding phase information in order to properly transduce complex sounds such as music.
The evoked time-delayed response in the ear is not simply a phase-shifted replica of the original sound, because the higher frequency components are time delayed less (about 8-10 milliseconds) than the lower frequency components of the emission (about 10-15 milliseconds). Also, the amplitude of the evoked response is non-linear with respect to the stimulus for high stimulus levels, amounting to about 1 dB for every 3 dB increase in the stimulus. The interaction of the stimulus and acoustic emission occurs increasingly with lower and lower levels of input, suggesting that the ear may have a compensation mechanism for low level signals. People with certain types of hearing loss do not product acoustic emissions. At low levels of auditory stimulus, the emissions are almost equal in amplitude to the incoming signal itself, and they occur even for pure tones. The ear can generate continuous signals, and generated signals as high as 8 KHz have been observed.
As noted earlier, the conscious hearing range is roughly between 20 Hz and 20 KHz. Audio equipment has been designed to be optimal within that range. Also, most equipment has been designed to accurately reproduce that which has been transduced and recorded. However, live sound is a transient phenomenon. It is not possible to compare a live sound with anything, because in order to do so, it must be transduced and recorded in some way. It is this fact that forms the motivation for the present invention, and discussion of the prior art that follows.
2. Description of Prior Art
There have been many attempts to unravel the compressed information in recorded sound to provide the information that was present in the live performance. Most of these attempts have colored the sound and failed because our understanding of how we hear has yet to be determined. However, progress has been made, and new theories have pointed the way toward a path that provides answers to previous questions concerning exactly how the human ear and brain interprets audio information.
Byrd in 1990 (PCT/US91/09375) described a circuit for adding phase information to audio signals in order to restore information that was lost during the transduction and recording process.
Tominari (1989) described a phase shift network that delayed low frequencies in time to provide an echo. Other attempts, although different than Tominari's, suffered from the same basic problem: how to restore the feeling of the live performance without causing some unwanted side effects. Even Byrd's design suffered from loss of a "center" such that vocals seemed to be in a tunnel, although instrumental music came alive with no side effects (there is no created center in most instrumental recordings).
Visser (1985) cites 35 U.S. and Foreign Patents that had attempted to create sound in various new ways. He considered his idea to be better than any of them, yet his addition of high-frequency broadband noise to audio signals and to transform the resultant to a binary coded pulse-width-modulation (while useful for some applications) is an unnecessary complication for purposes of creating a phase space out of an analog signal. The current invention overcomes all shortcomings of all the prior art and produces not only a re-creation of the live performance, but also provides a means for converting the processed signals into distortion-free digital duty-cycle-modulation and amplifying the result to virtually any desired power level at the most efficient and lowest cost possible.
Disclosure Document #344171 describes in block diagram form a circuit that was reduced to practice on Nov. 19, 1992. Units were loaned under-nondisclosure agreements for evaluation, and when it was apparent that significant advance in sound processing had occurred, Disclosure Document #374894 was filed on Apr. 24, 1995 detailing the circuits involved. The PWM chip that converts the output of the device from analog to digital and the amplifier circuit was added in July 1995. The circuits described herein represent a significant and unobvious improvement to the circuits described in PCT US91/09375.
Although prior art has attempted to correct some of the problems associated with distortion in audio systems due to phase shifts as a function of frequency, and spatial distortion due to the inherent inaccuracies in standard stereo, these attempts have not completely succeeded in restoring lost realism to recorded sound. At best, some prior art processors create the illusion of ambience.
The prior art provides single and, in some cases, multiple corrections to recorded signals. The object of the prior art is, in general, to control the location of sound cues and provide phase correction, to increase the quality of the sound by putting back in to the signal what was removed by the transduction, recording, and
As previously pointed out, microphones compress signals that can consist of many fundamental frequencies from different instruments at different spatial locations. These signals also contain complex interactions for the fundamentals and harmonics produced by the same instruments. When cymbals crash, for example, the harmonics produced reach above 100,000 Hertz. As the complex signal develops from these interactions, it can become non-linear and sub-harmonics will be present.
At first, it would appear impossible to retrieve or reconstruct a complex signal whose spectral content has been compressed by microphones in both the time and spatial domains. The digital sampling rate of information that is recorded on compact discs and digital audio tapes, for example, results not only in a loss of information, but also in an absolute frequency cutoff that is lower than the upper harmonics produced by some musical instruments. The present invention arises from the recognition that, if the harmonics and subharmonics of recorded sound are allowed to develop from the fundamental frequencies, and if the spectral content of the signal is spatially separated, the original live sound can be recreated and converted into a digital signal that contains all necessary information for the ear and brain to interpret and recreate the original live sound.