Methods and systems for recording and reproducing sounds produced by a plurality of sound sources are generally known. In the musical context, for example, systems for recording and reproducing live performances of bands and orchestras are known. In those cases, the sound sources include the musical instruments and performers' voices.
Recording and reproducing sound produced by a sound source typically involves detecting the physical sound waves produced by the sound source, converting the sound waves to audio signals (digital or analog), storing the audio signals on a recording medium and subsequently reading and amplifying the stored audio signals and supplying them as an input to one or more loudspeakers to reconvert the audio signals back to physical sound waves.
Audio signals are typically electrical signals that correspond to actual sound waves, however this correspondence is “representative”, not “congruent”, due to various limitations intrinsic to the process of capturing and converting acoustical data. Other forms of audio signals (e.g., optical), although more reliable in the transmission of acoustical data, encounter similar limitations due to capturing and converting the acoustical data from the original sound field.
The quality of the sound produced by a loudspeaker partly depends on the quality of the audio signal input to the loudspeaker, and partly depends on the ability of the loudspeaker to respond to the signal accurately. Ideally, to enable precise reproduction of sound, the audio signals should correspond exactly to (i.e., be a perfect representation of) the original sound, including its spatial (3D) properties, and the reconversion of the audio signals back to sound should be a perfect conversion of the audio signal to sound waves including its spatial (3D) properties. In practice however, such perfection has not been achieved due to various phenomenon that occur in the various stages of the recording/reproducing process, as well as deficiencies that exist in the design concept of “universal” loudspeakers.
Additional problems are presented when trying to precisely record and reproduce sound produced by a plurality of sound sources. One significant problem encountered when trying to reproduce sounds from a plurality of sound sources is the inability of the system to recreate what is referred to as sound staging. Sound staging is the phenomena that enables a listener to perceive the apparent physical size and location of a musical presentation. The sound stage includes the physical properties of depth and width. These properties contribute to the ability to listen to an orchestra, for example, and be able to discern the relative position of different sound sources (e.g., instruments). However, many recording systems fail to precisely capture the sound staging effect when recording a plurality of sound sources. One reason for this is the methodology used by many systems. For example, such systems typically use one or more microphones to receive sound waves produced by a plurality of sound sources (e.g., drums, guitar, vocals, etc.) and convert the sound waves to electrical audio signals. When one microphone is used, the sound waves from each of the sound sources are typically mixed (i.e., superimposed on one another) to form a composite signal. When a plurality of microphones are used, the plurality of audio signals are typically mixed (i.e., superimposed on one another) to form a composite signal. In either case the composite signal is then stored on a storage medium. The composite signal can be subsequently read from the storage medium and reproduced in an attempt to recreate the original sounds produced by the sound sources. However, the mixing of signals, among other things, limits the ability to recreate the sound staging of the plurality of sound sources. Thus, when signals are mixed, the reproduced sound fails to precisely recreate the field definition and source resolution of the original sounds. This is one reason why an orchestra sounds different when listened to live as compared with a recording. This is one major drawback of prior sound systems. Other problems are caused by mixing as well.
While attempts have been made to address these drawbacks, none has adequately overcome the problem. For example, in some cases, the composite signal includes two separate channels (e.g., left and right) in an attempt to spatially separate the composite signal. In some cases, a third (e.g., center) or more channels (e.g., front and back) are used to achieve greater spatial separation of the original sounds produced by the plurality of sound sources. Two popular methodologies used to achieve a degree of spatial separation, especially in home theater audio Systems, are Dolby Surround and Dolby Pro Logic. Dolby Pro Logic is the more sophisticated of the two and combines four audio channels into two for storage and then separates those two channels into four for playback over five loudspeakers. Specifically, a Dolby Pro Logic system starts with left, center and right channels across the front of the viewing area and a single surround channel at the rear. These four channels are stored as two channels, reconverted to four and played back over left, center and right front loudspeakers and a pair of monaural rear surround loudspeakers that are fed from a single audio channel. While this technique provides some measure of spatial separation, it fails to precisely recreate the sound staging and suffers from other problems, including those identified above.
Other techniques for creating spatial separation have been tried using a plurality of channels. However, regardless of the number of channels, such systems typically involve mixing source signals to form one or more composite signals. Even systems touted as “discrete multi-channel”, typically base the discreteness of each channel on a “directional component” (i.e., Dolby's AC-3, discrete 5.1 multi-channel surround sound is based on five discrete directional channels and one low-frequency effect channel). Surround sound using discrete channels for directional cues help create a more engulfing acoustical effect, but do not address the critical losses of veracity within the representative audio signal nor does it address the reproduction of the intraspace dynamics created by individual sound sources interacting with one another in a defined space.
Other separation techniques are commonly used in an attempt to enhance the recreation of sound. For example, each loudspeaker typically includes a plurality of loudspeaker components, with each component dedicated to a particular frequency band to achieve a frequency distribution of the reproduced sounds. Commonly, such loudspeaker components include woofer or bass (lower frequencies), mid-range (moderate frequencies) and tweeters (higher frequencies). Components directed to other specific frequency bands are also known and may be used. When frequency distributed components are used for each of multiple channels (e.g., left and right), the output signal can exhibit a degree of both spatial distribution and frequency distribution in an attempt to reproduce the sounds produced by the plurality of sound sources. However, maximum recreation of the original sounds is not fully achieved because the source signals continue to be a composite signal as a result of the “mixing” process.
Another problem resulting from the mixing of either sounds produced by sound sources or the corresponding audio signals is that this mixing typically requires that these composite sounds or composite audio signals be played back over the same loudspeaker(s). It is well known that effects such as masking preclude the precise recreation of the original sounds. For example, masking can render one sound inaudible when accompanied by a louder sound. For example, the inability to hear a conversation in the presence of loud amplified music is an example of masking. Masking is particularly problematic when the masking sound has a similar frequency to the masked sound. Other types of masking include loudspeaker masking, which occurs when a loudspeaker cone is driven by a composite signal as opposed to an audio signal corresponding to a single sound source. Thus, in the later case, the loudspeaker cone directs all of its energy to reproducing one isolated sound, as opposed to, in the former, the loudspeaker cone must “time-share” its energy to reproduce a composite of sounds simultaneously.
Another problem with mixing sounds or audio signals and then amplifying the composite signal is intermodulation distortion. Intermodulation distortion refers to the fact that when a signal of two (or more) frequencies is input to an amplifier, the amplifier will output the two frequencies plus the sum and difference of these frequencies. Thus, if an amplifier input is a signal with a 400 Hz component and a 20 KHz component, the output will be 400 Hz and 20 KHz plus 19.6 KHz (20 KHz−400 Hz) and 20.4 KHz (20 KHz+400 Hz).
The mixing of signals can also dictate the use of “universal loudspeakers”, meaning that a given loudspeaker must be capable of reproducing a full or broad spectrum of possible sounds. With the exception of frequency range breakout (e.g., electronic crossovers), loudspeakers are typically capable of reproducing a full range of sound sources. Subwoofers and tweeters are exceptions to this rule but their mandate for separation is based on frequency, not “sound source type”. The drawbacks with “universal” and “frequency dependent” loudspeakers is that they are not capable of being configured to achieve a full integral sound wave (including full directivity patterns) for a given sound source. By being “universal” and “non-configurable”, they can not be optimized for the reproduction of a specific sound source.
More specifically, existing sound recording systems typically use two or three microphones to capture sound events produced by a sound source, e.g., a musical instrument. The captured sounds can be stored and subsequently played back. However, various drawbacks exist with these types of systems. These drawbacks include the inability to capture accurately three dimensional information concerning the sound and spatial variations within the sound (including full spectrum “directivity patterns”). This leads to an inability to accurately produce or reproduce sound based on the original sound event.
A directivity pattern is the resultant sound field radiated by a sound source (or distribution of sound sources) as a function of frequency and observation position around the source (or source distribution). The possible variations in pressure amplitude and phase as the observation position is changed are due to the fact that different field values can result from the superposition of the contributions from all elementary sound sources at the field points. This is correspondingly due to the relative propagation distances to the observation location from each elementary source location, the wavelengths or frequencies of oscillation, and the relative amplitudes and phases of these elementary sources.
It is the principle of superposition that gives rise to the radiation patterns characteristics of various vibrating bodies or source distributions. Since existing recording systems do not capture this 3-D information, this leads to an inability to accurately model, produce or reproduce 3-D sound radiation based on the original sound event.
On the playback side, prior systems typically use “Implosion Type” (IMT) sound fields. That is, they use two or more directional channels to create a “perimeter effect” sound field. The basic IMT method is “stereo,” where a left and a right channel are used to attempt to create a spatial separation of sounds. More advanced IMT methods include surround sound technologies, some providing as many as five directional channels (left, center, right, rear left, rear right), which creates a more engulfing sound field than stereo. However, both are considered perimeter systems and fail to fully recreate original sounds. Perimeter systems typically depend on the listener being in a stationary position for maximum effect. Implosion techniques are not well suited for reproducing sounds that are essentially a point source, such as stationary sound sources or sound sources in the nearfield (e.g., musical instruments, human voice, animal voice, etc.) that should retain their full spectrum directivity patterns and radiate sound in all or many directions.
Despite significant improvements over the last two decades in signal processing and equipment design, the goal of “perfect sound reproduction” remains elusive.
Another problem with the existing systems of sound reproduction are the paradigmatic and other distortions created in an original event right from the beginning of the recording and reproduction process. Such distortions include: (1) lack of true field definition (source signals are mixed together and rely on perceptual effects for definition); (2) lack of source resolution (source rendering is via plane wave transducers, not integral wave transducers); (3) lack of spatial congruency (when source signals are mixed together, sound staging is an approximation at best, once again relying heavily on perceptual effects). These distortions are passed down through the recording and reproduction chain, so that each phase of the chain creates its own colorations on the original distortions created by the paradigm itself.
For example, in a typical stereo reproduction system, when an original event is captured, a multi-dimensional sound wave is represented by a two-dimensional (left/right) signal which is then mixed together with other two-dimensional signals representing other original sound sources within the same sound event, creating a mixture of two-dimensional signals. Once “spatial” and “mixing” distortions have been captured and processed they are passed along to the storage, recall, and reproduction parts of the recording and reproduction chain where additional colorations may be added, compounding the nature of the paradigmatic distortions.
Other contextual issues such as paradigms within paradigms (or sub-paradigms), often are a result of protocol and/or design issues. An example of a sub-paradigm issue is that of “perceptual” effects versus “physical” effects. Perceptual methods of sound reproduction are designed to trick the ear into perceiving certain elements such as spatial qualities and sound stage. Physical objectives for reproduction are focused on physically reproducing source dynamics including primary sources (sound producing entities) and secondary sources (sound effecting entities like room acoustics).
Yet another problem in sound reproduction is amplification. The current amplification of sound concept has remained essentially unchanged for over 40 years, in that, the output signal equals the input signal but at an elevated level. The problem with this approach is that the input signal may be a distorted representation of the original event and most of the time is a compilation of mixed signals representing the original event. When these signals are amplified, the distortions that are present due to the paradigm are amplified and as a result become more noticeable and have a greater impact on the reproduced event.
Another aspect of the problem relates to the issue of “film” paradigm versus the “music” paradigm. The film paradigm utilizes surround sound very well because, with the exception of dialog, most of the soundtrack is a far-field, moving, dynamic type of sound field (e.g., traffic, outdoor environments, etc.) or ambiance-related sound field (e.g., indoor venue, etc.) both of which do well with surround sound formats. Music, on the other hand, is typically a stationary sound event, usually in the near-field, and usually with a more intimate divergent type wave front as opposed to a convergent type wave front created from mid-field and far-field reproductions used in the film industry. Sub-paradigm issues such as these must be harmonized in accordance with the goals of the broader reproduction paradigm if the paradigmatic context is to be optimized and the paradigmatic distortion minimized or eliminated.
Another issue in the present state of sound recording and reproduction is the objectivism vs. subjectivism issue on how close the reproduced event matches the original sound event. Within the current state-of-the-art paradigm, objective measurements can be made (e.g., input signal vs. output signal), but the comprehensive evaluation of a given sound event remains somewhat subjective primarily because of a flawed context—comparison is between an integral form (original event) and a facsimile form (reproduced event). Only when the reproduction system can generate a synthetic sound event in the same integral form as an original event can we expect to render an objective evaluation of the reproduced event. Subjectivity will always play a role in determining which variations, deviations, etc. to an original event are preferable from one person to the next, but the quantifiable evaluation of a reality event and its corresponding synthetic event, should ultimately be an objective analysis.
The problem with trying to use a term like “realism” as a reference standard is not that it is inherently subjective (“reality” is actually inherently objective—it can be objectively measured and modeled, e.g., acoustical holography), but rather that it cannot be adequately synthesized in the same integral form as the original event. The subjective element arises when the audio community attempts to compare various distorted synthetic realities (reproduced events) to their corresponding undistorted original realities (original events), or worse yet, to one another. Even if perfection is interpreted differently by different people, that should not change the fact that the comparison of a reproduced event A to its corresponding original event A, should be an objective analysis. Even if an original source is unnatural or a hybrid of a natural sound, the objective is still to reproduce the source's integral state as determined by an artist and/or producer. A drawback of current systems is the lack of a means for developing reference standards for the articulation of all definable sound sources, and a means for describing derivatives, hybrids, and any other type of deviation from a given reference sound.
Thus, despite significant research and development, prior systems suffer various drawbacks and fail to maximize the ability of the system to precisely reproduce the original sounds.