Systems for recording and reproducing sounds produced by a plurality of sound sources are generally known. In the musical context, for example, systems for recording and reproducing live performances of bands and orchestras are known. In those cases, the sound sources are the musical instruments and performers' voices. More generally, however, a sound source is any object that produces sound. In a basic sense, sound is a series of physical disturbances in a medium (e.g., air). Typically, sound is created when an object (a sound source) vibrates, sending out a series of waves that propagate through air (or other media). In air, sound waves comprise fluctuations in air pressure above and below the normal atmospheric pressure (e.g., 14.7 psi). These fluctuations are referred to as compressions and rarefactions. When compressions and rarefaction impinge upon our eardrums, we perceive sound. The greater the change in air pressure above and below normal atmospheric pressure, the greater the amplitude of the sound. Since most objects vibrate with a periodic back-and-forth motion or oscillation, most sound waves (and nearly all musical sounds) have a periodic repetition, replicating the object's motion. Thus, a sound wave can be characterized by frequency and amplitude and can be represented generally by a sine wave. However, real sounds and musical signals are actually complex waves made up of many sound waves of different frequencies superimposed on one another. One reason for this is that a vibrating object (and therefore a sound wave produced by that object) includes a fundamental frequency (its lowest frequency) and overtones or harmonics which are a multiple of the fundamental frequency. The presence of these harmonics contribute to a musical instrument's characteristics, such as its timbre or tonal color. Thus, two instruments (e.g., a piano and a violin) both played at the same fundamental frequency will sound different because they have different harmonic structures. For example, a violin produces stronger harmonics that extend higher in frequency than that of the piano.
Another factor that affects the perception of sound is phase. The term phase refers to the time relationship between two or more sound waves. A phase shift refers to a time displacement of a wave (e.g., a sine wave) relative to a fixed point. Phase shift has important consequences when sine waves are combined or superimposed. If two sine waves of equal frequency and the same phase are superimposed, their combination will create a wave of greater amplitude. If, however, one of the waves is phase-shifted by 180 degrees, then the two waves will cancel each other and produce no signal.
Recording and reproducing sound produced by a sound source typically involves detecting sound waves produced by the sound source, converting the sound waves to audio signals (digital or analog), storing the audio signals on a recording medium and subsequently reading and amplifying the stored audio signals and supplying them as an input to one or more loudspeakers to reconvert the audio signals back to sound. Audio signals are typically electrical signals that correspond to actual sound waves, however this correspondence is “representative”, not “congruent”, due to various limitations intrinsic to the process of capturing and converting acoustical data. Other forms of audio signals (e.g., optical), although more reliable in the transmission of acoustical data, encounter similar limitations due to capturing and converting the acoustical data from the original sound field.
The reproduction of sound by use of loudspeakers typically involves moving a loudspeaker cone back and forth to recreate a pattern of compressions and rarefactions. The movement of the cone is controlled by inputting audio signals to a driver that drives the loudspeaker. As a result, the quality of the sound produced by a loudspeaker partly depends on the quality of the audio signal input to the loudspeaker, and partly depends on the ability of the loudspeaker to respond to the signal accurately. Ideally, to enable precise reproduction of sound, the audio signals should correspond exactly to (i.e., be a perfect representation of) the original sound and the reconversion of the audio signals back to sound should be a perfect conversion of the audio signal to sound waves. In practice however, such perfection has not been achieved due to various phenomenon that occur in the various stages of the recording/reproducing process, as well as deficiencies that exist in the design concept of “universal” loudspeakers.
Additional problems are presented when trying to precisely record and reproduce sound produced by a plurality of sound sources. One significant problem encountered when trying to reproduce sounds from a plurality of sound sources is the inability of the system to recreate what is referred to as sound staging. Sound staging is the phenomena that enables a listener to perceive the apparent physical size and location of a musical presentation. The sound stage includes the physical properties of depth and width. These properties contribute to the ability to listen to an orchestra, for example, and be able to discern the relative position of different sound sources (e.g., instruments). However, many recording systems fail to precisely capture the sound staging effect when recording a plurality of sound sources. One reason for this is the methodology used by many systems. For example, such systems typically use one or more microphones to receive sound waves produced by a plurality of sound sources (e.g., drums, guitar, vocals, etc.) and convert the sound waves to electrical audio signals. When one microphone is used, the sound waves from each of the sound sources are typically mixed (i.e., superimposed on one another) to form a composite signal. When a plurality of microphones are used, the plurality of audio signals are typically mixed (i.e., superimposed on one another) to form a composite signal. In either case the composite signal is then stored on a storage medium. The composite signal can be subsequently read from the storage medium and reproduced in an attempt to recreate the original sounds produced by the sound sources. However, the mixing of signals, among other things, limits the ability to recreate the sound staging of the plurality of sound sources. Thus, when signals are mixed, the reproduced sound fails to precisely recreate the original sounds. This is one reason why an orchestra sounds different when listened to live as compared with a recording. This is one major drawback of prior sound systems. Other problems are caused by mixing as well.
While attempts have been made to address these drawbacks, none has adequately overcome the problem. For example, in some cases, the composite signal includes two separate channels (e.g., left and right) in an attempt to spatially separate the composite signal. In some cases, a third (e.g., center) or more channels (e.g., front and back) are used to achieve greater spatial separation of the original sounds produced by the plurality of sound sources. Two popular methodologies used to achieve a degree of spatial separation, especially in home theater audio systems, are Dolby Surround and Dolby Pro Logic. Dolby Pro Logic is the more sophisticated of the two and combines four audio channels into two for storage and then separates those two channels into four for playback over five loudspeakers. Specifically, a Dolby Pro Logic system starts with left, center and right channels across the front of the viewing area and a single surround channel at the rear. These four channels are stored as two channels, reconverted to four and played back over left, center and right front loudspeakers and a pair of monaural rear surround loudspeakers that are fed from a single audio channel. While this technique provides some measure of spatial separation, it fails to precisely recreate the sound staging and suffers from other problems, including those identified above.
Other techniques for creating spatial separation have been tried using a plurality of channels. However, regardless of the number of channels, such systems typically involve mixing audio signals to form one or more composite signals. Even systems touted as “discrete multi-channel”, base the discreteness of each channel on a “directional component” (i.e., Dolby's AC-3, discrete 5.1 multichannel surround sound is based on five discrete directional channels and one omni-directional bass channel). “Directional components” help create a more engulfing acoustical effect, but do not address the critical losses of veracity within the audio signal itself.
Other separation techniques are commonly used in an attempt to enhance the recreation of sound. For example, each loudspeaker typically includes a plurality of loudspeaker components, with each component dedicated to a particular frequency band to achieve a frequency distribution of the reproduced sounds. Commonly, such loudspeaker components include woofer or bass (lower frequencies), mid-range (moderate frequencies) and tweeters (higher frequencies). Components directed to other specific frequency bands are also known and may be used. When frequency distributed components are used for each of multiple channels (e.g., left and right), the output signal can exhibit a degree of both spatial distribution and frequency distribution in an attempt to reproduce the sounds produced by the plurality of sound sources. However, maximum recreation of the original sounds is not fully achieved.
Another problem resulting from the mixing of either sounds produced by sound sources or the corresponding audio signals is that this mixing typically requires that these composite sounds or composite audio signals be played back over the same loudspeaker(s). It is well known that effects such as masking preclude the precise recreation of the original sounds. For example, masking can render one sound inaudible when accompanied by a louder sound. For example, the inability to hear a conversation in the presence of loud amplified music is an example of masking. Masking is particularly problematic when the masking sound has a similar frequency to the masked sound. Other types of masking include loudspeaker masking, which occurs when a loudspeaker cone is driven by a composite signal as opposed to an audio signal corresponding to a single sound source. Thus, in the later case, the loudspeaker cone directs all of its energy to reproducing one isolated sound, as opposed to, in the former, the loudspeaker cone must “time-share” its energy to reproduce a composite of sounds simultaneously.
Another problem with mixing sounds or audio signals and then amplifying the composite signal is intermodulation distortion. Intermodulation distortion refers to the fact that when a signal of two (or more) frequencies is input to an amplifier, the amplifier will output the two frequencies plus the sum and difference of these frequencies. Thus, if an amplifier input is a signal with a 400 Hz component and a 20 KHz component, the output will be 400 Hz and 20 KHz plus 19.6 KHz (20 KHz-400 Hz) and 20.4 KHz (20 KHz+400 Hz).
Another problem with existing loudspeakers is that they usually perform well at certain frequencies but not at others. Some are suited well for one type of music (e.g., rock), but not for others (e.g., a symphony). Furthermore, different frequency ranges require different levels of amplification to achieve an otherwise harmonious magnification. Current technology provides methods for suppressing such incongruencies, but the methods are artificial and present a very limited linear solution to a nonlinear problem. Also, their directional qualities are limited.
Thus, despite significant research and development, prior systems suffer various drawbacks and fail to maximize the ability of the system to precisely reproduce the original sounds.