In considering what is meant by the phrase “High Fidelity Loudspeaker,” we must first have a very clear understanding of precisely what “Fidelity” means. A good working definition, compiled from numerous dictionary entries, is:
Fidelity: The Degree of Accuracy with which Music is Recorded and Reproduced.
Some Synonyms for Accuracy include: Exactness, Precision, Correctness.
In order to understand how this definition should be applied to optimum loudspeaker design, it is essential to first understand the basic form of recorded music. Shown in FIG. 1 is a sample of images which illustrate a brief moment of a single channel of recorded music in visual form. These are “screenshots” (aka “brief moments in time”) taken from a high-performance digital storage oscilloscope being fed a recorded music signal from a high-fidelity preamplifier. The horizontal (X) axis represents Time, and the vertical (Y) axis represents Amplitude. In this case, amplitude is in units of voltage, as that is the conventional basic unit of recording and playback. Note the scale of the screenshots: Time is 500 us (500 microseconds) per block, or 5 ms (0.005 seconds) for the entire screen. Each tiny division is therefore 100 us (0.0001 seconds). Amplitude is 100 mV (100 millivolts) per block, or 20 mV per tiny division.
The exact names (artists, songs, albums) of these particular images do not matter at all—these images are intended only to give an understanding of what music actually “looks like” in real time. Within the entire catalog of recorded music known to mankind, there are literally billions upon billions of such unique images. (A single standard CD alone can hold nearly a million of these screenshots.) And these screenshots are, philosophically speaking, exactly like snowflakes—they all have certain inherent properties which they all share, and yet you can look for the rest of your life and never find two which are exactly identical—every single one of them is absolutely unique.
So, based on these visual images, what are the inherent, defining properties of music itself (and therefore, high fidelity recorded music)?
1. It is Continuous. It never jumps from one value to a completely different value in zero time, but rather, it flows continuously from one value to the next over time.
2. It is Singular. At every single moment in time, it has one and only one single, specific amplitude, never more than one nor less than one. In other words: It traces a single line through time.
3. It is Complex. It is not reducible to a simple equation, and it is constantly changing shape in unpredictable ways. Another way of saying this is: Music is always transient in nature.
4. It is Unique. At every single, precise, unique instant of time, it has a single, precise, unique corresponding amplitude. This fact is at the very heart and soul of every piece of music ever played, and every piece of music ever recorded. If you change either the amplitude at a precise moment in time, or the time at which a precise amplitude occurs, the music is no longer itself, and the reproduction can no longer be considered “High Fidelity,” because the fundamental unique shape of the waveform has been changed. In other words: Time and Amplitude are absolutely inseparable if the music is to remain as it was originally, or if music is to be considered “High Fidelity” when reproduced.
Next, in order to understand what capabilities are absolutely essential to a “High Fidelity” loudspeaker, and more specifically, how good each of those capabilities must be, we must first investigate the capabilities (and limitations) of the human hearing system. Any loudspeaker (or other component) which aspires to “High Fidelity” must meet at least a minimum level of performance in all of these areas, or else the human hearing system will be able to detect very easily that the “reproduced music” is fundamentally wrong compared with “real music.” The following four criteria are all different, but every single one is fundamentally important to high fidelity music reproduction:
1. Frequency Response: The range of human hearing is traditionally stated as 20 Hz-20 kHz. Music can have a wider range, but most music is within these limits. (Some basic facts: The lowest frequency attained by common instruments is A0 on the standard 88-key piano, at 27.5 Hz. The lowest frequency on a standard four-string bass is E1, at 41.2 Hz. During music reproduction, most domestic (and mastering) rooms exhibit “room gain” in the deep bass, beginning around 40 Hz and increasing at lower frequencies, and thus it is advantageous to have the loudspeaker begin a very gentle rolloff at around 40 Hz, to avoid overpressure at extremely low frequencies. Finally, most adults cannot hear much above 16 kHz, regardless of what information is above that.) Thus, in the real world, we can say that the loudspeaker system should have relatively flat anechoic response from 40 Hz-20 kHz, with a very gentle rolloff below that, keeping the in-room response flat from 20 Hz-20 kHz.
2. Dynamic Range and Signal-to-Noise Ratio: These are two very similar criteria, so are discussed together. The human hearing system has a basic dynamic range of 0 dB-120 dB SPL, from the quietest detectable sound to the limit of brief exposure before physical pain or hearing damage. Typical extremely quiet rooms, with very good acoustic isolation, have a background noise level of 20 dB (below which any signal gets buried under the background noise), with typical very quiet rooms around 30 dB background noise, and typical untreated rooms around 40-50 dB background noise. Thus, we can state that we should strive for a minimum S/N ratio, in any reproduction system, of at least 100 dB (120 dB minus 20 dB), and a minimum usable dynamic range of 100 dB also (20 dB-120 dB SPL). And 120 dB for both figures would be welcome. Because most real music has a maximum in spectral energy content in the octaves on either side of 200 Hz (i.e., 100 Hz-400 Hz), this is generally where the highest output is necessary, with slightly lower requirements over the remainder of the audio band.
3. Amplitude Resolution: Under ideal laboratory conditions, the human hearing system can resolve an amplitude difference of 0.5 dB. In the real world, while playing music, a 1.5 dB difference in amplitude is somewhat difficult to resolve, even for expert listeners, while 3 dB is rather easy even for untrained listeners. Of course, these numbers represent huge increments in loudness level. A change of 3 dB is literally twice the acoustic power (or half the power), meaning a change in signal voltage level by a factor of 1.414 (the square root of 2). Even a 1.5 dB change in level represents over a 40% change in acoustic power, or nearly a 20% change in signal voltage. To think about it another way, even if we say that a good listener can distinguish 1.5 dB increments at any volume level while listening to music, there are only 80 discrete music volume levels that his/her hearing system can possibly distinguish, from softest to loudest! (120 dB divided by 1.5 dB.) In other words, the human hearing system is really quite insensitive to changes in signal amplitude. Nonetheless, the traditional standard+/−3 dB specification for frequency response in loudspeakers is quite appropriate as a basic requirement for “high fidelity” music reproduction. And +/−1.5 dB would be preferable.
4. Time Resolution: Under ideal laboratory conditions, the human hearing system can resolve time differences of less than 10 us (0.00001 seconds, or 10 microseconds). Recent scientific experiments have shown that this is true of both binaural hearing (via sound localization studies) and monaural hearing (meaning that each individual ear has the same inherent 10 us time resolution capability, as would logically be predicted). In the real world, while playing music, a 40 us time difference is somewhat difficult to resolve, even for expert listeners, while 80 us (0.00008 seconds) is rather easy even for untrained listeners. (As an easily understood example, 80 us represents an “image shift” in a stereo playback system, from dead-center to 10 degrees off-axis. This image shift will be easily noticed by even casual listeners. More attentive listeners will be able to notice image shifts from center to only 5 degrees to one side (equal to 40 us), and many listeners can do even better than this. Similar time-resolution capabilities apply to each ear individually, even if stereo image shift is not used as the test.) Thus, similar to our amplitude data above, we can state that a “high fidelity” playback system should introduce time errors of no more than 80 us in the signal, and preferably no more than 40 us. This standard should apply throughout the majority of the audible frequency spectrum, but can be relaxed significantly in the low bass and high treble, as the human hearing system becomes quite insensitive to timing at very low and very high frequencies.
Now that we have a basic understanding of human hearing capabilities, let's briefly revisit the screenshots of music in FIG. 1. If we insist on time errors no greater than 40 us, and amplitude voltage errors of no more than 20% (both the “preferable” requirements for high fidelity above), we notice that the eyes and the ears do not see (or hear) things the same at all. At the scale of these screenshots, a time error of 40 us is only 4/10 of one tiny division! This is extremely difficult for the eye to resolve. On the other hand, with a peak-to-peak voltage of 4 blocks as seen on these screenshots, a 20% change in voltage amplitude is 4 full tiny divisions of error in amplitude, 10 times more than the allowable visual error in the time scale, and incredibly easy for the eye to resolve. If we reduced the displayed amplitude to where a 20% change in peak-to-peak amplitude represented the same visual error as on the time scale, the vertical signal voltage displayed would have an amplitude of only +/−1 tiny division!! In other words, it would be so shrunken in vertical scale that the eyes would hardly be able to resolve any changes in amplitude in the signal at all. This should give a visual illustration of just how critically important time errors are, relative to amplitude errors. One should not allow their eyes to deceive them about the capabilities of their ears—they are two entirely different physiological systems, and their relative capabilities are not at all the same. The human hearing system is vastly more sensitive to Time than it is to Amplitude.
It should be emphasized once again that the above four criteria should all be met simultaneously, in order for a music playback system to present reproduced music in a form which the human hearing system will recognize as “like real music.” Any system which does not meet all four criteria simultaneously should not be described as “High Fidelity,” because the human hearing system's innate capabilities will easily be able to recognize that it is not.
It is now necessary to investigate the inherent capabilities and limitations of the major types of historical loudspeakers, and then to understand why those limitations fundamentally prevent them from attaining the label “High Fidelity,” regardless of cost.
1. Horn Loudspeakers: The earliest form of sound reproduction device, dating to the 1800s and used by Edison in the earliest forms of sound recording and playback. Still used extensively for low-fidelity sound reinforcement applications, where output capability and efficiency are paramount. Problems include: (a) Non-linear air pressure swings during compression vs. rarefaction, resulting in audible distortions, (b) “Horn Colorations” due to suboptimal physical horn geometry, also an audible form of distortion, (c) limited bandwidth of individual horns, necessitating the use of multiple drivers with crossovers, which automatically precludes high fidelity (discussed in more detail below), and (d) Necessity of use either with dynamic woofers (with all the problems discussed below), or with bass horns which, if sized for true 20 Hz extension, are the size of entire rooms.
2. Electrodynamic or Dynamic (“direct radiator”) Loudspeakers: Also rather old, with the earliest crude forms dating back to the late 1800's. The basic modern form of this type was described by Rice and Kellogg in 1925, nearly 100 years ago, and all modern iterations operate on the same fundamental physics. The fundamental limitation of the dynamic loudspeaker is that it operates (in physics terms) as a mass on a spring. This will be covered in much greater detail below. Briefly put, because it has mass, it has inertia, and because it has inertia, it is always and forever trying (unsuccessfully) to catch up to the input signal. It can't be started moving when it should, and it can't be stopped when it should either. And at every point in between, it is always behind where it should be, in the time domain. Even worse, its time lag is both transient-dependent and frequency-dependent, meaning that its time delays are not consistent across the frequency spectrum—the lower frequency components of the signal are delayed in time worse than the higher frequencies, and therefore these problems cannot be fixed by simple physical driver offsets—it is mathematically impossible. Therefore, it cannot meet the basic requirements for “High Fidelity,” even as a single driver without the additional problems of crossovers, because it is a complete disaster in the time domain relative to the requirements of “High Fidelity.”
3. Multiway Electrodynamic (Dynamic) Loudspeakers: A variation of the above, but with multiple drivers, each of which covers a limited frequency range, usually with crossovers dividing the signal between individual drivers. By far the most popular modern form of the loudspeaker. This type takes the fundamental Achilles' Heel of the electrodynamic driver above (the “mass on a spring” problem), and makes it even worse in the time domain. There are two main reasons for this:
3.1 Woofer diaphragms have 5-10 times the mass of midrange diaphragms, which in turn have 5-10 times the mass of tweeter diaphragms. Yet the drivers all have relatively similar magnetic field strengths. This means, based on basic physics (F=ma), that the acceleration of tweeters is vastly faster than midranges, which in turn are vastly faster than woofers. This can be seen very clearly by looking at the impulse response of a multiway loudspeaker, even many which claim to be “time aligned”: First to arrive is the tweeter impulse, followed (after a delay of typically 200 us) by the midrange impulse, followed (after an even longer delay, typically 1000 us) by the woofer impulse. This is the natural consequence of a mass responding to an input force: A lot more mass takes a lot longer to get it moving. And notice the delay times: all of them are extremely obvious relative to the known real-world capability of the human hearing system at 40 us. Furthermore, we have already established that all music is transient in nature. Thus, whenever the musical signal changes direction unpredictably (which, as we already know, is all the time), the tweeter's change in response to that signal will arrive at the ears long before the midrange's, which in turn will arrive long before the woofer's.
3.2 The crossovers typically used in multiway systems contribute even more frequency-dependent non-linear phase shift, and those phase shift errors are added to the innate responses of the drivers. And this problem gets worse as the crossover slope goes higher. It is mathematical fact that no crossover type above first-order can possibly sum correctly in time and amplitude under transient conditions (aka real music). It is not merely difficult; it is mathematically impossible. And since these phase errors are again non-linear with frequency, they contribute non-linear time errors to the system's response. And again, these time errors cannot possibly be fixed with physical driver offsets, because they vary with frequency. When combined with the inherent mass-related time delays above, it is normal in multiway dynamic systems to have phase error differences in the range of 720 degrees or more across the frequency spectrum. This is a complete disaster in the time domain.
The practical consequence of this behavior, in all conventional dynamic loudspeakers, regardless of type or cost, is that for any instrument which generates fundamentals and overtones (which includes virtually any instrument one could possibly name), many overtones will arrive at the ears long before the fundamentals. Certainly a single-driver speaker is superior in this regard relative to a non-time-aligned multiway with high-order crossovers, but the fundamental problem remains. Imagine just how incredibly irritating this is to the human hearing system, to constantly be bombarded by high frequency overtones long before the arrival of the lower frequency fundamentals. This, in a nutshell, is the source of “brightness” and “glare” and “listener fatigue” in speakers which otherwise may measure “flat” in frequency response, and also the fundamental reason why dynamic speakers are instantly recognized by the human hearing system as “speakers” and “not real.” It is also the reason why many dynamic loudspeakers have a deliberate pronounced “downward slope” in frequency response from bass to treble, often 10 dB or more: Their designers are trying to compensate for the irritation caused by the early arrival of the high frequencies, relative to the low frequencies, by progressively boosting the lower frequencies. This is basically a very crude attempt to try to fool the ear into paying more attention to the (late-arriving) lower frequencies, because they are louder relative to the (early-arriving) higher frequencies, thus supposedly “balancing out” the perceived sound. But this does not work because it is impossible to fix an inherent problem in the time domain by creating an equally egregious problem in the amplitude domain.
In conventional dynamic loudspeakers, given the magnitude of the time delays between various frequency components in the music, even from a single dynamic driver, it is obvious to the ears that something is very, very wrong. But because this type of (time arrival) error occurs nowhere in nature and nowhere in natural sounds, humans have never adapted to it evolutionarily, and the ear can't recognize what the problem is, although it knows for sure that something is very wrong. It knows that there is a very big difference between what it's hearing, and what real natural music sounds like.
4. Panel Dipole (Electrostatic or similar) Loudspeakers: First seen 60 years ago in Peter Walker's legendary Quad in 1957. Historically speaking, the last big breakthrough in loudspeaker performance, and the first wide-range transducer in the history of the world to have, at least approximately, correct Time vs. Amplitude characteristics. (And also the reason that it actually sounds like real music in the upper half of the human hearing range.) However, the electrostat (or any planar dipole variation) cannot be considered “high fidelity” due to the fact that it is a dipole. Because it is a dipole, it creates a full-power inverted-phase acoustical backwave at exactly the same time as the front wave. And at frequencies beginning in the midrange and steadily worsening at lower frequencies, the inverted-phase backwave becomes progressively less directional, and begins to combine with the front wave, but with a large time delay. This results in enormous errors in both time and amplitude, with the result being that dipoles, by definition, cannot be considered “high fidelity” loudspeakers. Furthermore, the limited excursion available in all electrostats creates power-handling problems in the bass which, added to dipolar bass cancellation, seriously compromises amplitude accuracy and dynamic range at lower frequencies. Many speakers have tried to mate dynamic woofers to electrostats with crossovers, but they all suffer from the same (unsolvable) problems in the time domain as multiway dynamics.
5. Bending-Wave Loudspeakers: These fall into both flexible-diaphragm and semi-rigid-diaphragm types, with many variations. However, all of them suffer from the same problems: (a) Presence of flexure and mechanical standing waves on diaphragms, resulting in significant errors in both time and amplitude, and (b) limited bandwidth, typically resulting in the necessity (yet again) of combining them with dynamic woofers and crossovers, again precluding high fidelity.
It is against this background that the present invention has been developed.