1. Field of the Invention
The present invention relates to audio signal processing and, more particularly, to a signal processor that produces an effect that resembles a live performance, restoring what was lost during the transduction and recording sequences.
2. Description of the Background
Some audio processing systems make use of amplitude or frequency adjustment or both, others rely on optimizing the Group Delays of frequencies; however, the current invention described herein converts audio information (changes in amplitude and frequency) into a phase space, where changes in amplitude and frequency become phase shifts of portions of a digital pulse. Regular loudspeakers can automatically decode the phase information, resulting in a virtually distortion-free analog acoustic signal whose bandwidth is approximately 0-50 KHz and whose characteristics include not only changes in amplitude and frequency, but, also, changes in phase as a function of frequency. It is obvious from the scientific literature that no coherent theory of hearing currently exists. Low frequency acoustic signals"" wavelengths are much too large to enter the ear canal, and high frequencies (above 6 kHz) cause bifurcation of neuronal firing in the brain, allowing for subconscious processing of frequencies considered outside the (20-20 kHz) hearing range. The current invention expands the bandwidth of the audio signal from the xe2x80x9cnormalxe2x80x9d frequency range to about 0-50 KHz, separates frequencies as a function of time, and converts ordinary stereo signals into phase distributed monaural. The result is a signal that interacts with the human brain in a new way to produce an effect that resembles a live performance, restoring what was lost during the transduction and recording sequences.
It was discovered in 1932 that hearing was not strictly physical, that psychological factors also contributed to our perception of sound (On Minimum Audible Sound Fields, by Sivian and White. Presented to the Acoustical Soc. of Am. at Ann Arbor, Mich. Nov. 29, 1932). That phase angles of even pure tones are sensed by humans was established in 1956 (Just Noticeable Differences in Dichotic Phase, by Zwislocki and Feldman, in J. Acous. Soc. of Am., Vol. 28, #5, September 1956). The ear is a non-linear devicexe2x80x94current models and theories of hearing are based, primarily, on old, outdated linear models and concepts (Auditory Frequency Selectivity, edited by Moore and Patterson, NATO ASI series A, Vol 119, 1986). Some musical harmonics are non-linear (violin overtones, e.g.) (Regarding the Sound Quality of Violins and a Scientific Basis for Violin Construction, by H. Meinel, Ln, J. Acous. Soc. Am., Vol 29, #7, July 1957). The interaction of acoustic signals from various musical instruments (including Human voices and electronic synthesizers) create interference patterns that are embedded in the recording medium (tape, e.g.), but whose characteristics are ignored with current transducers and recording and playback equipment (Both Sides of the Mirror: Integrating Physics and Acoustics with Personal Experience, by Helen Hall, in, Leonardo Music Journal, vol 3, pp17-23, 1993). Just as laser processing of images focused on photographic emulsions can bring out the three-dimensional information in the two-dimensional image by retrieving phase information in the interference patterns, so this invention restores 3D information lost during transduction and recording. The result is a restoration of the xe2x80x9clivexe2x80x9d performance.
Classical theory indicates that acoustic events can be described in at least two ways; in a time domain or a frequency domain, each convertible into the other via a Fourier transformation. The mathematical formulation for this process is well known. The time-domain characterization of an acoustical event is a scalar, while the frequency-domain representation is a complex vector quantity containing amplitude and phase information. The time domain representation can also be expressed as a complex quantity. The scalar portion of the time domain vector represents performance based on impulse excitation; the imaginary part of the vector is the Hilbert transform of the scalar.
Loudspeakers and electrical networks which transfer energy from one form to another can be characterized by response to an impulse function, because the impulse response can be manipulated to predict the behavior of the system in response to any arbitrary signal. Fourier transforms work for predictive systems as well as causal systems. However, the group velocity of a set of audio signals is not related to time delay for all possible systems, and uniform group delay does not insurb a distortionless system. Group Delay is derived from phase delay, which is defined as the phase shift in a system at a given frequency. Group delay is associated with a group of frequencies around a central carrier, such as those encountered in modulated communications, but it also finds some relevance in describing how a system responds to a change in frequency. Group Delay follows an inverse square law. The value is fixed for DC but it approaches a finite value (near zero) at infinity. For a given function, and given the appropriate values for the resistor and capacitor, this logarithmic response will appear across the audio range. Starting with Tgd=2xcex10/xcex102+xcfx892, it can be shown that:
Tgd(xcfx89≈xcex10)≈2.3/xcex10log(xcex10/xcfx89)
For a simple case it is possible to relate a logarithmic approximation to the group delay. The approximation was developed around a region where alpha equals omega. A more general equation for a larger region is presented below. It was derived using similar techniques but spans the area from omega equals alpha to omega xe2x80x9clargexe2x80x9d (50K radians or so). Small values of omega are not permissible, and the error at omega equals alpha is significant. These logarithmic equations are not specifically necessary for the design process but when the user works with practical circuits, it will be noted (on test equipment) that the Group Delay of the audio bandwidth changes logarithmically with frequency. The following equation can be used to validate the observations; however, it is noted that because of the foregoing, Group Delay is rather meaningless and phase shift more accurately describes the true action of the circuit. Group Delay is included here to provide an alternate way of analyzing the circuit""s action.
These two equations are generally equivalent for xcfx89 greater than 5xcex10:
Tgd(xcfx89)=2xcex10/xcex102+xcfx892
Tgd(xcfx89)=2xcex10*ln[1+(xcex10/xcfx89)]2
The same equation rewritten for standard logarithms:
Tgd(xcfx89)=4.6/xcex10*log[1+(xcex10/xcfx89)]2
Interaural time difference, necessary for determining the position of a source, also has bearing on pitch. Further, most of the critical interaural time differences are in the range of plus or minus 1 millisecond. Thus, when the group delay is modified, so is the perception of the sound.
A generalized version of the All-pass response Group Delay is presented below. This equation can be used, with reasonable accuracy, to predict the group delay of a specific frequency for various RC combinations. It also accounts for Gain adjustments. Using these equations, one can tailor the Group Delay response. Referring to FIG. 5:
xcex10=1/R1C and A=R3/R2.
The general transfer function is:
T(s)=xe2x88x92Asxe2x88x92xcex10/s+xcex10
which means the gain of the circuit is:
|T(s)|=xe2x88x92A
The phase response is:
xcfx86(xcfx89)=xe2x88x922tanxe2x88x921(xcfx89{square root over (A)}/xcex10)
and the Group Delay is given by:
Tgd(xcfx89)=(A+1)xcex10/xcex102+xcfx89A2+A*50 ns+100 ns
The second and third terms are included because their exclusion yields increasingly poor results with increasing frequencies. The above equations may be interpreted in the following physical sense: alpha determines the range over which the group delay responds logarithmically. An increase in alpha will tend to shift the range to higher frequencies, but will reduce the group delay itself, i.e., the actual delay times will decrease. A decrease in Gain, A, can be used to offset this decrease in delay. Conversely, for a given alpha, adjusting Gain can be used to set the delay time at the expense of the overall frequency range. Increasing Gain increases the maximum delay presented by the system (at very low omega), but the delay of a specific frequency compared to unity gain will be smaller due to the shift in frequency range; adjusting alpha can be used to compensate.
The circuits shown in FIGS. 3 and 5 all utilize an alpha of about 100 radians each. Increasing alpha will tend to emphasize lower frequencies, and decreasing alpha will tend to emphasize higher frequencies. In any case, a specifically desired response will require the adjustment of both Gain and alpha.
FIG. 3 shows the cascaded implementation. The effect of the cascade is a linear addition of the delays. The general effect of cascading is to delay a broader range of frequencies by a greater amount, thus enhancing the effect.
Because the time and frequency domains are two ways of describing the same event, accurate time domain representation cannot be obtained from limited frequency domain information. For example, the time delay of a frequency component passing through a system with non-uniform response cannot be determined with accuracy. However, a joint time-frequency characterization can be made using first and second order all-pass networks. This is consistent with ordinary human experience. At any frequency there are multiple arrivals of the audio signal at the listener""s location as a function of time.
The individual time-frequency components of an audio signal, predicted mathematically, overlap in the time and frequency domains. Therefore, a graphical presentation is not possible, because it is impossible to separate simultaneous arrival times in a single time domain Plot.
Potential energy (i.e., pressure expressed in dB) and comparisons of input to output signals directly (a measure of distortion) do not completely describe the performance of audio equipment quality such as loudspeakers, microphone, and electrical networks. Total sound energy provides phase distortion information and, although phase is not detectable consciously for simple signals, there are indications that the human hearing mechanism is capable of processing complex functions and perceiving phase information as part of total sound perception.
The square root of the total energy density vector E is equal to the sum of the square root of the potential energy vector and the imaginary component of the square root of the kinetic energy vector:
{square root over (E)}={square root over (P)}+i{square root over (K)}
Attempts to measure the total energy density at a microphone responding to a remote sound source will only yield part of the total energy density of the source. Thus, at any given moment, a microphone will not directly measure E. Essentially, a microphone compresses complex spatial, multi-dimensional acoustic signals into a single point in time and space, effectively making the signal two-dimensional as a function of time. However, the information necessary to unravel the entire original signal is contained in the compressed signal and can be retrieved if processed property.
Although the threshold of hearing has been established in terms of vector orientation and frequency of pure tones (see, Sivian and S. White, supra), pure tones have no Fourier transforms. The human hearing mechanism processes total energy density, not just the xe2x80x9cminimum audible pressurexe2x80x9d associated with a pure audio tone.
The ability to localize direction and distance from a sound source has something to do with the orientation of the ear with respect to the vector components of sound. For pure tones, simply the phase differences between arrival of the signal at the two ears provides a clue to the direction of the source. See, Kinsler and Frey, Fundamentals of Acoustics (New York: John Wiley and Sons, 1950), pp. 370-392. Thus, the minimum audible field for binaural hearing varies with amplitude, frequency, and azimuth relative to the source signal.
J. Zwislocki and R. Feldman (1956) xe2x80x9cJust Noticeable Differences in Dichotic Phasexe2x80x9d, J. Acoust. Soc. Am., Vol. 28, No. 5, p. 860 (Spetember 1956) pointed out that the ears may not be able to detect phase or time differences abuve 1300 Hertz and the only directional Qlueo aboye 1300 Hz are contained in relative intensity differences at the ears.
In reality, the human auditory system binaurally localizes sounds in complex, spherical, three dimensional space using two sensors (ears) that are unlike microphones, a computer (brain) that is unlike any computer constructed by man, and, at a live performance, the eyes. The eyes allow us to xe2x80x9chearxe2x80x9d direction by providing a sensory adjunct to the ears for localization of sound in azimuth, distance and height. During reconstruction of a familiar sound, such as a symphony orchestra, the brain remembers instrument placement and correlates this information with auditory clues to provide a more complete sense of the individual orchestra sections and sometimes of the locations of individual instruments. Techniques for localizing sound direction by the ears, neural pathways, and the brain have been termed xe2x80x9cpsychoacousticsxe2x80x9d.
In addition to direction, the brain will interpret distance as a function of intensity and time of arrival differences. These clues can be provided by reflected sound in a closed environment such as a concert hall, or by other means for sound originating in environments where no reflections occur, such as in a large open field. In a closed environment, there is a damping effect as a function of frequency due to reverberations. When acoustic energy is reflected from a surface, a portion of the energy is lost in the form of heat. Low frequencies tend to lose less energy and are transmitted more readily, whereas high frequencies tend to be absorbed more quickly. This makes the decay tipe of high frequencies shorter than that of low frequencies. The air itself absorbs all frequencies, with greater absorbtion occurring at high frequencies.
In xe2x80x9cBiophysical Basis of Sound Communicationxe2x80x9d by A. Michelson (in B. Lewis (ed.), Bioacoustics, A Comparative Approach (London: Academic Press 1983), pages 21-22, the absorption of sound in air is described as a combination of dissipation due to heat and other factors not well understood. In air, the absorption coefficient in dB/100 meters is 1 at about 2 khz. At about 9 khz, the signal is down by 10 dB; at 20 khz it is down by 100 dB; and at 100 khz (the upper harmonics of a cymbal crash), it is down by about 1000 dB. Thus, higher harmonics generated by musical instruments are drastically attenuated (in a logarithmic fashion) by even a distance of a few feet when traveling to microphones, and then even more when traveling from speakers to the listener""s ears.
With conventional stereophonic sound reproduction systems, it is necessary to be equidistant from the speakers in order to experience the proper stereo effect. With earphones, standard stereo provides a strange pingpong effect coupled with an elevated xe2x80x9ccenter stagexe2x80x9d in the middle and slightly above the head. At best, ordinary stereo is an attempt to spread sound out for increased realism, but it is still basically two-dimensional.
In the 1920s Sir Oliver Lodge tested human hearing range out to 100 khz. It has been suggested that the true range of human hearing is not completely known. However, the outer ear, inner ear (cochlea), auditory nerve, and human brain are capable of detecting, routing, and processing frequencies in excess of 100 khz, and possibly to 300 khz and beyond. However, conscious hearing is limited by the brain to roughly 20 hz to 20 khz.
There is no currently accepted theory of how humans actually hear outside the voice range of acoustic signals.
Below about 20 Hz, the wavelength of an acoustic pressure wave is too large to enter the ear canal. Experience with low frequency standing waves suggests an interaction with the cochlea or auditory nerve directly. Indeed, standing wave acoustic emitters produce the perception of distortion-free sound throughout the hearing range. Above about 6 Hz, the xe2x80x9cvolleyxe2x80x9d theory and active cochlear processes could account for an increase in hearing range beyond 20 khz. The volley theory is derived from the fact that there is not a single stimulus-response event per nerve; rather, higher frequency stimulation results in a multiplicity of neural firings. The process is one of bifurcation wherein the higher frequencies cause a greater number of neurons to fire. This suggests the possibility of fractal pattern generation. How the brain interprets the volley of information presented to it is unknown, however.
In xe2x80x9cAuditory Functionxe2x80x9d, edited by G. Edleman, W. Gall, and W. Cowan, (New York: John Wiley and Sons, 1986), a class of experiments is described which demonstrate acoustic emissions from animal and human ears. The cochlea can function as a generator of acoustic signals which can combine with incoming signals to produce higher frequencies. Both empirical and theoretical studies (indicating that active cochlea processes are necessary for basilar membrane tuning properties) support the concept.
P. Zurek, in xe2x80x9cAcoustic Emissions from the Earxe2x80x94A Summary of Results from Humans and Animalsxe2x80x9d, J. Acoust. Soc. Am., Vol. 78, No. 1, pp. 340-344 (July 1985), indicates that frequency selectivity results from active cochlear processes. When the ear is presented with a nonlinear pulse, in addition to the stimulus response mechanism, another response with an 8 millisecond (or longer) delay is produced. This,phase-shifted signal, generated by the ear, may play a role in the actual way in which we hear music and other high frequency sounds. When musical instruments produce sound, the various Fourier waveforms are not simply produced independently of each other, but exist in a phase space wherein there are phase interactions among all of the sounds. Even a single string plucked on a harp or struck on a piano will produce phase-related signals and harmonics, not simply frequencies and amplitudes. Thus, the ear must be capable of decoding phase information in order to properly transduce complex sounds such as music.
The evoked time-delayed response in the ear is not simply a phase-shifted replica of the original sound, because the higher frequency components are time delayed less (about 8-10 milliseconds) than the lower frequency components of the emission (about 10-15 milliseconds) Also, the amplitude of the evoked response is non-linear with respect to the stimulus for high stimulus levels, amounting to about 1 dB for every 3 dB increase in the stimulus. The interaction of the stimulus and acoustic emission occurs increasingly with lower and lower levels of input, suggesting that the ear may have a compensation mechanism for low level signals. People with certain types of hearing loss do not product acoustic emissions. At low levels of auditory stimulus, the emissions are almost equal in amplitude to the incoming signal itself, and they occur even for pure tones. The ear can generate continuous signals, and generated signals as high as 8 khz have been observed.
As noted earlier, the conscious hearing range is roughly between 20 hz and 20 khz. Audio equipment has been designed to be optimal within that range. Also, most equipment has been designed to accurately reproduce that which has been transduced and recorded. However, live sound is a transient phenomenon. It is not possible to compare a live sound with anything, because in order to do so, it must be transduced and recorded in some way. It is this fact that forms the motivation for the present invention, and discussion of the prior art that follows.
There have been many attempts to unravel the compressed information in recorded sound to provide the information that was present in the live performance. Most of these attempts have colored the sound and failed because our understanding of how we hear has yet to be determined. However, progress has been made, and new theories have pointed the way toward a path that provides answers to previous questions concerning exactly how the human ear and brain interprets audio information.
Byrd in 1990 (PCT/US91/09375) described a circuit for adding phase information to audio signals in order to restore information that was lost during the transduction and recording process.
Tominari (1989) described a phase shift network that delayed low frequencies in time to provide an echo.
Other attempts, although different than Tominari""s, suffered from the same basic problem: how to restore the feeling of the live performance without causing some unwanted side effects. Even Byrd""s design suffered from loss of a xe2x80x9ccenterxe2x80x9d such that vocals seemed to be in a tunnel, although instrumental music came alive with no side effects (there is no created center instrumental recordings).
Visser (1985) cites 35 US and Foreign Patents that had attempted to create sound in various new ways. He considered his idea to be better than any of them, yet his addition of high-frequency broadband noise to audio signals and to transform the resultant to a binary coded pulse-width-modulation (while useful for some applications) is an unnecessary complication for purposes of creating a phase space out of an analog signal. The current invention overcomes all shortcomings of all the prior art and produces not only a re-creation of the live performance, but also provides a means for converting the processed signals into distortion-free digital duty-cyclemodulation and amplifying the result to virtually any desired power level at the most efficient and lowest cost possible.
Disclosure Document #344/171 describes in block diagram form a circuit that was reduced to practice on Nov. 19, 1992. Units were loaned under-nondisclosure agreements for evaluation, and when it was apparent that significant advance in sound processing had occurred, Disclosure Document #374/894 was filed on Apr. 24, 1995 detailing the circuits involved. The PWM chip that converts the output of the device from analog to digital and the amplifier circuit was added in July 1995. The circuits described herein represent a significant and unobvious improvement to the circuits described in PCT US91/09375.
Although the prior art has attempted to correct some of the problems associated with distortion in audio systems due to phase shifts as a function of frequency, and spatial distortion due to the inherent inaccuracies in standard stereo, these attempts have not completely succeeded in restoring lost realism to recorded sound. At best, some prior art processors create the illusion of ambience.
The prior art provides single and, in some cases, multiple corrections to recorded signals. The object of the prior art is, in general, to control the location of sound cues and provide phase correction, to increase the quality of the sound by putting back in to the signal what was removed by the transduction, recording, and playback systems.
As previously pointed out, microphones compress signals that can consist of many fundamental frequencies from different instruments at different spatial locations. These signals also contain complex interactions for the fundamentals and harmonics produced by the same instruments. When cymbals crash, for example, the harmonics produced reach above 100,000 Hertz. As the complex signal develops from these interactions, it can become non-linear and sub-harmonics will be present.
At first, it would appear impossible to retrieve or reconstruct a complex signal whose spectral content has been compressed by microphones in both the time and spatial domains. The digital sampling rate of information that is recorded on compact discs and digital audio tapes, for example, results not only in a loss of information, but also in an absolute frequency cutoff that is lower than the upper harmonics produced by some musical instruments. The present invention arises from the recognition that, if the harmonics and subharmonics of recorded sound are allowed to develop from the fundamental frequencies, and if the spectral content of the signal is spatially separated, the original live sound can be recreated and converted into a digital signal that contains all necessary information for the ear and brain to interpret and recreate the original live sound.
It is, therefore, an object of the present invention to cause harmonics and sub-harmonics to develop for all frequencies, by continuously correcting the phase of the signal logarithmically as a function of frequency, by spatially separating the spectral content of the signal, by increasing the audio bandwidth of the signal, and digitizing the result.
The invention is based, in part, on the recognition that the human hearing mechanism for sensing audio signals (in contrast to electromagnetic and tactile signals) is different from the electronic circuits used to construct amplifiers, microphones, tape recorders, and other types of audio equipment. Thus, when humans hear or sense an audio signal, it is processed differently than standard apparatus attempting to transduce, record, and playback the original signals.
The present invention provides a new way to process and amplify sound in a way that converts amplitude, frequency, and phase information into duty cycle modulation of a high frequency digital pulse. The signal is integrated by the voice coil of ordinary loudspeakers and the phase information is interpreted by the brain so as to provide three dimensional sound. The acoustic hologram so produced is perceived to be like a live performance. Simple digital switching amplifiers can be added to yield any desired power level.
The invention has several objects:
to create an expanded bandwidth for recorded (and live) sound in order to take advantage of harmonics outside the xe2x80x9cnormalxe2x80x9d (20-20 khz) hearing range;
to create a phase shift of frequencies such that higher frequencies effectively reach the ear after lower frequencies (this creates the three dimensional characteristics of the sounds);
to allow natural harmonics to be generated (this provides a sense of being closer to the source);
to convert the amplitude, phase, and frequency information into duty cycle modulation of a high frequency digital pulse ( greater than 43 KHz) in order to encode the information in a way the ear and loudspeaker can precisely recreate the original information;
and to amplify the result with a low distortion, simple, inexpensive amplifier that has no feedback.
In accordance with one aspect of the present invention, an audio signal processor is provided comprising an input terminal for receiving an audio signal, first, second, and third processing stages for processing the audio signal, and an-output terminal for coupling the processed audio signal to an output device. The first and second signal processing stages are arranged in a series or cascade configuration, and each stage functions to phase shift fundamental and harmonic frequencies as a function of frequency. The phase shift increases in a negative direction with increasing frequency, so that higher frequency signals lag the lower frequency signals. Also, the left and right channels are crossed over twice in order to homogenize the signal into phase distributed monaural. The output is then fed into a digital chip that converts the amplitude, frequency, and phase information into a form of duty cycle modulation.
The present invention is implemented by means of a relatively simple electronic circuit that can be manufactured and sold at very low cost. The principal components of the circuit can, if desired, be reduced to a single dual inline package (DIP) which can be incorporated into existing types of audio equipment. The invention can be utilized with nearly all existing types of power amplifiers, stereo tuners, and phonographs with preamplifiers, as well as with compact disk (CD) players, digital audio tape (DAT) players, and conventional analog tape recorders and players. All recorded media can be reproduced with a sound that is close to that of a live performance.
The invention can be used with any number of audio channels or speakers; the resulting sound will be dimensionalized, to some extent, with even a single speaker. The signal processing that is carried out by the present invention transfers to tape and to virtually any other type of recording medium. Thus, for example, a digital CD output can be processed using the present invention, and the result can be recorded on ordinary stereo audio tape. The present invention restores information that has been lost during digital or analog processing, as well as during the transduction of the original sound, and may be employed at a radio or television broadcasting station to improve the quality of the audio signal received by the listeners.
Further objectives, advantages and novel features of the invention will become apparent from the detailed description which follows.