This invention relates to the recording of orchestral sounds, choirs, or any type of music or sound effect and to the subsequent playback in a manner that simulates a particular position on a sound stage and particular features thereof (such as size, shape and acoustics). By way of example, the invention can be used to recreate the sound of a conventional symphonic recording setup that uses numerous spot microphones and five to seven surround sound (sometimes called “Decca Tree”) microphones.
As little as one microphone and one channel of actual audio for sound storage (in digital or analog format) of each individual sound source may be used. Multiple sources (e.g., multiple instruments within an orchestra) are recorded individually and stored discretely on recording media. The invention is particularly well suited for use with a multiplicity of sound sources which are distributed around a real or virtual environment, and for reproduction of the sounds made by those sources on an expanded stereo or a 7.1 stereo surround sound system (although it is suitable also for fewer or more reproduction channels).
Unlike conventional reverberation and spatial effects systems that are entirely post-production oriented, meaning they are used after sound is recorded, this invention may benefit from a specific microphone placement and recording technique used in conjunction with specialized processing upon playback. As a result, the invention may capture the sound quality of a recorded space, and it may permit continuously variable control of the size and liveliness of the apparent recording environment.
The invention enhances the quality and controllability of sampled sound libraries, and it is also applicable to many other situations where one wishes to have real-time, continuously variable control of the apparent size of the acoustic sound field upon playback of recorded or synthesized sound. The invention is applicable for both studio production use as well as for live performances of sampled or synthesized music and for computer gaming. It can also be applied to benefit broadcast and other means of sound distribution such as CD, DVD, Internet and other means, including future means, of audio distribution, storage and reproduction.
Many years ago, recordings for home listening were made in simple 2-channel (left-right) stereo. (This, of course, was subsequent to the “ancient” monaural days.) For sound reproduction in motion picture theaters, a center channel was added to the stereo channels to keep the “audio image” from appearing to drift. So, theatrical recordings were made in 3-channel (left-center-right) stereo. Then along came 5-channel stereo recording that created expanded stereo for wide-screen presentations (adding far left and far right to the left-center-right sound field). (Other channel allocations and speaker placements were used as well.) A “subwoofer” or very low bass channel was added, and this arrangement can be referred to as 5.1 (pronounced “five-dot-one”) expanded stereo. Typically in 5.1 expanded stereo sound production, the subwoofer channel is artificially created from the 5-channel stereo mix, although special effects can be added to the subwoofer channel (such as rumble to simulate rocket engines, explosions, earthquakes and so forth). Left and right rear speakers have also been added. These rear speakers are often referred to as “surround” speakers and the audio channel feeds to them are referred to as “surround” channels. In many motion picture theaters today, there are three speakers (a left speaker, a center speaker and a right speaker) behind the projection screen (i.e., at the front of the auditorium), a subwoofer (low frequency speaker) usually in front and below the screen, and two speakers in the rear of the auditorium (a left rear (or left surround) speaker and a right rear (or right surround) speaker), with a separate audio channel feeding each speaker. This system is usually referred to as a “5.1 stereo surround” system or simply a “5.1” system. Many home theater systems emulate the motion picture theater systems and 5.1 stereo surround has become a standard for high-quality motion picture soundtracks and digital video disc (DVD) soundtracks.
Some high-end productions are being released in 7.1 (“seven-dot-one”), adding far left and far right channels to the behind the screen speakers. Some are experimenting with other formats involving different numbers of channels.
In the prior art recording of orchestras, for greater realism and ease of production, symphonic recording sessions would be done in a very large room with 50 to 100 musicians all playing as though they were in a live concert. This assembly of musicians typically would be recorded using from 15 to 25 microphones (“mics”). The audio signals from the mics would be mixed down (combined) to fewer channels, which channels were ultimately preserved on discrete tracks of a professional digital or analog recording system.
This “ideal” situation is seldom available, and more frequently symphonic recordings (such as for motion picture soundtrack scoring) are made with sections of about 10 instruments at a time in multiple recording sessions. The effect of a “full orchestra” is created by mixing together the results of these multiple sessions. This compromise is made primarily to avoid the high cost of hiring a large orchestra and of renting a large enough studio or hall in which to record it. Using a process called overdubbing, a musician can listen in headphones to previously recorded portions of the program while recording additional material, thus allowing the musician to synchronize his/her playing and to control the timber and volume of the instrument similarly to what would naturally occur in a full orchestra environment. Thus, one musician can, in sequence, play several parts or different instruments. However, the overdubbing process does not provide the same sonic quality as can be achieved with all members of the orchestra actually present at once in a large space.
Despite the pressure for economy, the need for superb sonic quality has not diminished, especially for large-sounding motion picture (or television) scores. There is still a demand for the “large sound.”
Primarily in response to the high cost of live musical scoring, digitally sampled music technology has been taking over the recording industry. Using sampling technology, a library of recorded instrument sounds can be played to create soundtracks without the need for any (or at least not as many) live players. The libraries are played using digital devices known as samplers, which may be embodied as computer programs, or the samplers may be embodied in dedicated hardware systems. Unfortunately, there are myriad difficulties when using today's sample libraries or even using synthesizers.
The overall systems (library and sampler) seldom deliver their promised realism, economy, or simplicity of use. Many sampled instrument libraries only go as far as 2-channel stereo, although some have been created to satisfy the 5.1 format. The 7.1 format has not yet been tackled by samplers, and even the latest releases of 5.1 libraries pose serious technical and performance issues in practical application only a handful of libraries sold in recent years are entirely new productions, with most libraries using at least some reworked recordings from older, poorer quality libraries. Since leading-edge sound quality is difficult to find in samplers, and is not uniform across the sounds offered, and since the 5.1 sampler systems are either too complex or just plain fail to function satisfactorily when pressed to the limit (i.e., recreating the sound of about one hundred different instruments at a time to simulate a full orchestra), simple 2-channel stereo libraries have continued to sell. In other words, there is a large gap between what the market would like to have at its disposal, and what is practically achievable from currently available products.
To extend the utility of 2-channel stereo libraries, end users have attempted to simulate the 5.1 surround sound environment by using 5.1 surround reverb software programs. Typical embodiments would be the plug-in software “processors” like those sold by Lexicon of Sandy, Utah or TC Electronics of Risskov, Denmark, and intended for use with ProTools™ software/hardware platforms of Digidesign, a division of Avid Technology, Inc., of Daly City, Calif. Although these are useful tools, they do not accurately simulate the sound of multiple microphones picking up all the instruments during a live recording session in a single space (such as an auditorium or studio).
Today's hardware-based and software-based 5.1 reverbs treat each signal as though it were being picked up by five room microphones, typically in a so-called “Decca Tree” configuration. A Decca Tree arrangement usually consists of five microphones many feet above and in front of the sound source (three in a frontal triangle plus added far left and far right microphones). This arrangement was first popularized in Decca Records' London studios many years ago. Other arrangements of five microphones are simply known as 5-channel sound. Such reverbs do not simulate the additional spot microphones that are present during recordings of a full complement of musicians in a larger (or even in a smaller) recording studio or hall.
The difference between typical 5.1 reverb-processed surround and live 5.1 surround recording must be understood in order to appreciate all that the invention described herein does to advance the state of the art. Reference here is made to FIG. 10, which includes an overhead view of a traditional live orchestra arrangement for over a hundred musicians playing an even greater number of instruments 22 (including violins, flutes, brass, percussion and so forth), which are depicted in the drawing as rectangular and circular shapes, positioned in the orchestral stage portion 32 of a recording studio 34 (which could be a concert hall). Spot mics 24 are distributed throughout the area of the sound stage where the orchestra is positioned, and room mics 26 are positioned near the front of the orchestra and significantly higher than the spot mics. The spot mics are typically directional mics and primarily pick up the sonic character of the instruments near them. In FIG. 10 twenty five spot mics are shown, which would be within the range typically used for recordings of a full orchestra. The room mics are typically omni directional mics and pick up more of the sound of the environment as well as the blended sound of many instruments. In FIG. 10, seven room mics 26 are shown. One each is positioned, respectively, at the far left, left, center, right and far right above the front of the orchestral stage (which is coincident with the front of the part of the studio representing the audience area 36). The two remaining room mics are positioned to the far left and far right of the rear of the audience area 36. These two remaining mics are intended to pick up the sounds which would reverberate from the rear of the audience area 36.
During a live orchestral recording session, each microphone, including each of the spot mics 24 and room mics 26 in the studio (or on stage), picks up sounds from all the instruments present. This “mic bleed” is what gives live ensemble recordings a feeling of space and depth as well as the impression that all the musicians were playing in the same location at once. This is true whether the recording is for a full performance of a musical program or whether it is for the purpose of deriving digital samples that capture the sound of the studio or hall. Orchestral recording is discussed here because it is among the most complex and challenging, but the descriptions apply equally to almost any kind of sound recording. The five front room microphones (far left—left—center—right—far right) are located high in the air and near the front of the orchestra, and they serve to capture much of the sound of the studio (or hall) as well as of the orchestra itself (or the sound of a section of the orchestra if the recording is being done with a subgroup of musicians), but the room or surround mics are not the only mics contributing to the recording. Numerous spot mics which are placed lower and closer to specific instruments or instrument sections, are mixed into the overall recording at which time they are panned to the appropriate left-to-right location in the stereo sound field.
Many recording engineers have tried to use only or primarily the room microphones (typically in a “Decca Tree” microphone array or some variation thereof) to capture the entire performance, but they have generally found the results to be unsatisfactory. Such room-mic-only recordings can seem “muddy” or lacking in clarity and definition. The “presence” (i.e., the midrange/high frequency spectral content) of any given sound source is better captured by a nearby microphone, and that is why spot mics are almost always used in conjunction with room mics to achieve satisfactory results. However because the room and spot mics are picking up the same sound but in different locations, there often are phase-related sonic cancellations, which make it very difficult to optimize the placement and mixing of the various mics. Among its other benefits, the invention controls this phase cancellation problem.
Typically, in prior art methods, which pick up sounds that are to be similarly panned, spot mics are sub-mixed into “stems” (in which case each spot mic is not recorded discretely, but instead is recorded along with other spot mics as part of a group or stem). This means that existing master recordings are generally unsuitable for use with this invention, as will become evident once the method of this invention is understood. Although the microphone closest to one instrument will predominantly pick up the sound of that instrument, it also picks up the sound of every other instrument playing, the phenomenon previously mentioned as “mic bleed.” The bleed sound from more distant instruments is usually detected at a lower volume level than the sound of the instrument(s) nearest to a given spot mic due to normal acoustic attenuation over distance. Also, because sound travels at a finite speed (about 1.1 feet per millisecond), the sound from a particular source reaches the spot mic nearest it sooner than its sound reaches a remotely spaced spot mic. Additionally, the sound arriving at the remotely spaced spot mic is diminished in high frequency content due to the differential attenuation of higher frequencies by the air itself. Thus the spectral balance of an instrument in the bleed sound picked up by each more distant spot mic is not the same as sound picked up by the spot mic nearest to the instrument. The more distant a sound source is from a given spot mic, the more reverberation is present (rapid, blended sound reflections primarily from sound bouncing off the floor, ceiling and walls). Such reverberation decays more slowly than the direct sound from the sound source. Also, the orientation of each spot mic (or room mic, for that matter) with respect to a given sound may differ, introducing additional phase shift with respect to that same sound as it is sensed by each of the various mics. Thus, the bleed in different spot mics comprises different “versions” of the sound from a given source, depending largely upon the spot mic distance from the source. With each instrument (or other sound source) bleeding into all the other spot microphones, the overall captured sound field from all these spot mics imbues a “color” and a recognizable characteristic to the recorded sound, one that gives the impression of a particular sized recording space.
The sound mix of all the spot mics provides most of the character and impression of size to a recording. The sounds recorded by the spot mics are enhanced by the elevated room microphones such as those in a Decca Tree. By way of example, if one records a single instrument in a large hall with only the nearest spot mic, and no other spot mics—that recording does not sound as spacious or appear to be as realistic as one recorded with all the typical spot mics and Decca Tree room mics contributing to the recording. In fact, it sounds almost as though it were recorded in a much smaller room. Adding conventional sound effects and reverberation processing will make the sound appear “larger” but cannot fully overcome the lack of realism and spaciousness, particularly when recording with multiple “spot-mic'd” sources. Even with a five room mic sound array added to the mix, the sound still lacks the depth and spaciousness that can be heard when all the spot mics and room mics pick up bleed from all the instruments and contribute this sound to the mix.
For prior art sampled sound, several obstacles arise for sampler playback if the samples were recorded with the typical 15 to 25 spot mics plus the 5 room mics picking up the orchestra (or any group of musicians or singers). The first and most obvious issue to anyone who has tried to use sample libraries is that the sound of the hall is “locked in” through this technique, particularly if the sample recording was done in a large space. In this instance, the long time delays and natural sound reflections in a large recorded space become part of the sampled sounds and cannot later be removed or altered appreciably. This is an insurmountable obstacle when one needs the sound of a smaller environment and the samples were recorded in a large environment. Going the other way, small studio recordings can be “stretched” somewhat through the addition of artificial reverberation and delay processing, but this does not accurately create the sound field achieved from mic bleed when spot microphones would be laid out further apart in a larger space. Because one does not have access to each spot microphone in a mixed-down multi-instrument sample recording (even if it is in 5.1 format), there is no way to alter the relationship of the spot mic contributions, nor do conventional reverbs provide means to simulate the way the spot mics function to create the impression of the live sound field.
When smaller numbers of musicians are recorded in smaller studios, as is more often the case today, the deficiency of not having the correct spot microphone scaling is especially evident. Many engineers, composers and musicians may be unaware of why the sound seems “too small” or incorrect, and so when they attempt to scale up the sound of a smaller studio using conventional 5.1 reverb embodied in software or hardware based systems, they find that no amount of conventional post-production reverberation effects can truly emulate the spaciousness of the sound which would have been achieved in the larger environment with the mic bleed. Even if they manage to come close to the “right sound” for one or a few instruments, the effect deteriorates when a whole section of instruments or an entire orchestra is processed—even using the most advanced 5.1 reverb systems. This inability to satisfactorily scale the orchestra (or other group) occurs because the individual instruments in the sample library lack the correct and unique spot mic “bleed” contributions; treating the sum of all the instruments with reverb processors as though they came from one (or even 5) locations simply cannot emulate what happens with an array of five or more room mics and a large array of many spot mics.
The inability to accurately and continuously scale the apparent size of the recorded space, particularly to make it smaller, is a drawback with conventional sample libraries and, as stated above, it is a deficiency that can barely be compensated using reverberation and effects processing. However, there is another major obstacle to the implementation of surround stereo recordings with sampler technology. That is, today's best full-orchestral library samplers with surround sound capability are difficult to set up and use. The means by which conventional samplers function to provide changes in the characteristics of notes played typically requires that several “versions” of a given note be loaded at once, multiplying the number of samples that must be processed in real time. Because the demand on computer resources rises and falls as a prior art sampler is played, the internal computer data busses and input/output ports can suddenly and unpredictably “choke” (create glitches in the sound or crash completely) if the user demands playback of “just that one more note.” Consequently real-time, one-take performance is difficult to achieve from a large-scale 5.1 sample library.
Setting aside the practical aspects noted above for a moment, there are serious sonic issues even assuming one manages to get the prior art sample library playing satisfactorily and reliably. True 5.1 recordings of a sound have many open mics (that is, they have many microphones picking up and recording the sound). Not only do the mics pick up the sounds of the instruments, all these mics also pick up the sound of the room—which includes undesirable noise such as air conditioning rumble, coughs, flipping pages of music and shuffling feet. Besides room noise, for each “live” mic there is a degree of electronic noise present (even a simple carbon resistor generates electrical noise due to thermally-stimulated molecular activity). When one plays back a 5.1 recording of a typical complement of 25 spot and room microphones, one hears 25 mic's worth of electric noise, plus five or six times the ambient room noise. This noise is then multiplied by each note being played; add another note to a chord, and you get another dose of all that room and mic noise. Indeed, noise builds up rapidly, even with what seemed like a very quiet studio to the naked ear. Thus, even a genuine, well-recorded prior-art 5.1 sample library would not sound as good as would a live performance or a single-take recording of an orchestra with just one set of open mics that are heard just once. As discussed below, a sample used with the current invention has the sound picked up by only one mic (typically a spot mic) or two mics (a spot mic and a room mic).
Yet another problem occurs with conventional sampled libraries. Each individual recorded note is not simply stored in the sampler but rather it is digitally edited. In some cases this is done to make it possible to sustain a note (as if it were played by a bowed violin, a horn, or a piano with the sustain pedal depressed), however long the original recorded note might have been, for as long as a keyboard's key is held down. Thus, instead of storing the full note, the sampler version has a truncated note, with the ends “chopped off” at precisely chosen times (or more accurately stated, precise locations in the sonic continuum of the note) selected so that the end of the note can be dove-tailed into the beginning of the note in a process known as “looping.” When a key is released (i.e., when a note is no longer wanted), any looped sound ceases and the sampler now plays the “decay” or final portion of the recorded sound. With a conventionally recorded note's sound, the reverberation of the room in which the note was played is largely heard at the end of the note and is inextricably mixed with the sound of the note trailing off to silence. Sometimes the person playing these sampled notes wants the sound to end more quickly than was originally recorded, and so a sampler function can be invoked which truncates (chops off the decaying sound) at the end of such notes. Unfortunately, truncating a note causes a significant loss of the reverberant information as well so the impression of the size and depth of the recorded environment is lost. Using the subject invention to process recorded notes, even if they are truncated, overcomes the problem found with conventionally truncated samples in that the room characteristic is still present (i.e., added by the invention's processing that creates “lingering” reverberation when a note is released).