The present inventions relate to signal and waveform processing and analysis. It further relates to the identification and separation of more simple signals contained in a complex signal and the modification of the identified signals.
Audio signals, especially those relating to musical instruments or human voices, have a characteristic harmonic content that defines how the signal sounds. It is customary to refer to the harmonic as harmonic partials. The signal consists of a fundamental frequency (first harmonic f1), which is typically the lowest frequency (or partial) contained in a periodic signal, and higher-ranking frequencies (partials) that are mathematically related to the fundamental frequency, known as harmonics. Thus, when the partial has a mathematical relationship to the fundamental, they are just referred to as harmonics. The harmonics are typically integer multiples of the fundamental frequency, but may have other relationships dependant upon the source.
The modern equal-tempered scale (or Western musical scale) is a method by which a musical scale is adjusted to consist of 12 equally spaced semitone intervals per octave. This scale is the culmination of research and development of musical scales and musical instruments going back to the ancient Greeks and even earlier. The frequency of any given half-step is the frequency of its predecessor multiplied by the 12th root of 2=1.0594631. This generates a scale where the frequencies of all octave intervals are in the ratio 1:2. These octaves are the only consonant intervals; all other intervals are dissonant.
The scale""s inherent compromises allow a piano, for example, to play in all keys. To the human ear, however, instruments such as the piano accurately tuned to the tempered scale sound quite flat in the upper register, so the tuning of some instruments is xe2x80x9cstretched,xe2x80x9d meaning the tuning contains deviations from pitches mandated by simple mathematical formulas. These deviations may be either slightly sharp or slightly flat to the notes mandated by simple mathematical formulas. In stretched tunings, mathematical relationships between notes and harmonics still exist, but they are more complex. Listening tests show that stretched tuning and stretched harmonic rankings are unequivocally preferred over unstretched. The relationships between and among the harmonic frequencies generated by many classes of oscillating/vibrating devices, including musical instruments, can be modeled by a function
fn=f1xc3x97G(n)
where fn is the frequency of the nth harmonic, f1 is the fundamental frequency, known as the 1st harmonic, and n is a positive integer which represents the harmonic ranking number. Examples of such functions are
fn=f1xc3x97nxe2x80x83xe2x80x83a)
fn=f1xc3x97nxc3x97(S)log2nxe2x80x83xe2x80x83b)
fn=f1xc3x97nxc3x97[1+(n2xe2x88x921)xcex2]1/2xe2x80x83xe2x80x83c)
where S and xcex2 are constants which depend on the instrument or on the string of multiple-stringed devices, and sometimes on the frequency register of the note being played. The n xc3x97f1xc3x97(S)log2n is a good model of harmonic frequencies because it can be set to approximate natural sharping in broad resonance bands, and, more importantly, it is the one model which simulates consonant harmonics, e.g., harmonic 1 with harmonic 2, 2 with 4, 3 with 4, 4 with 5, 4 with 8, 6 with 8, 8 with 10, 9 with 12, etc. When used to generate harmonics, those harmonics will reinforce and ring even more than natural harmonics do.
Each harmonic has an amplitude and phase relationship to the fundamental frequency that identifies and characterizes the perceived sound. When multiple signals are mixed together and recorded, the characteristics of each signal are predominantly retained (superimposed), giving the appearance of a choppy and erratic waveform. This is exactly what occurs when a song is created in its final form, such as that on a compact disk, cassette tape, or phonograph recording. The harmonic characteristics can be used to extract the signals from the mixed, and hence more complex, audio signal. This may be required in situations where only a final mixture of a recording exists, or, for example, a live recording may have been made where all instruments are being played at the same time.
Musical pitch corresponds to the perceived frequency that the human recognizes and is measured in cycles per second. It is almost always the fundamental or lowest frequency in a periodic signal. A musical note produced by an instrument has a mixture of harmonics at various amplitudes and phase relationships to one another. The harmonics of the signal give the strongest indication of what the signal sounds like to a human, or its timbre. Timbre is defined as xe2x80x9cThe quality of sound that distinguishes one voice or musical instrument from anotherxe2x80x9d. The American National Standards Institute defines timbre as xe2x80x9cthat attribute of auditory sensation in terms of which a listener can judge two sounds similarly presented and having the same loudness and pitch are dissimilar.xe2x80x9d
Instruments and voices also have characteristic resonance bands, which shape the frequency response of the instrument. The resonance bands are fixed in frequency and can be thought of as a further modification of the harmonic content. Thus, they do have an impact on the harmonic content of the instrument, and consequently aid in establishing the characteristic sound of the instrument. The resonance bands can also aid in identifying the instrument. An example diagram is shown in FIG. 1 for a violin. Note the peaks show the mechanical resonances of the instrument. The key difference is that the harmonics are always relative to the fundamental frequency (i.e. moving linearly in frequency in response to the played fundamental), whereas the resonance bands are fixed in frequency. Other factors, such as harmonic content during the attack portion of a note and harmonic content during the decay portion of the note, give important perceptual keys to the human ear. During the sustaining portion of sounds, harmonic content plays a large impact on the perceived subjective quality.
Each harmonic in a note, including the fundamental, also has an attack and decay characteristic that defines the note""s timbre in time. Since the relative levels of the harmonics may change during the note, the timbre may also change during the note. In instruments that are plucked or struck (such as pianos and guitars), higher order harmonics decay at a faster rate than the lower order harmonics. The string relies entirely on this initial energy input to sustain the note. For example, a guitar player picks or plucks a guitar string, which produces the sound by the emission of energy from the string at a frequency related to the length and tension of the string. In the case of the guitar, the energy of the harmonics has its largest amount of energy at the initial portion of the note and then decay. Instruments that are continually exercised, including wind and bowed instruments (such as flute or violin), harmonics are continually generated. This is because the source is continually creating a movement of the string or breath of a wind player. For example, a flute player must continue to blow across the mouthpiece in order to produce a sound. Thus, each oscillation cycle puts additional energy into the mouthpiece, which continually forces the oscillatory resonance to sound and subsequently continues to produce the note. The higher order harmonics are thus present throughout most or all of the sustain portion of the note. An example of a flute and piano are shown in FIGS. 2A and 2B respectfully.
As an example, an acoustic guitar consists of 6 strings attached at one end to a resonating cavity (called the body) via an apparatus called a bridge. The bridge serves the purpose of firmly holding the strings to the body at a distance that allows the strings to be plucked and played. The body and bridge of the guitar provides the primary resonance characteristics of the guitar, and converts the oscillatory energy in the strings into audible energy to be heard. When a string is plucked or picked on the guitar, the string oscillates at the fundamental frequency. However, there are also harmonics that are generated. These harmonics are the core consistency of the generated timbre of the note. A variety of factors subsequently help shape to timbre of the note that is actually heard. The two largest impacts come from the core harmonics created by the strings and the body resonance characteristics. The strings generate the fundamental frequency and the core set of harmonics associated with the fundamental. The body primarily shapes the timbre further by its resonance characteristics, which are non-linear and frequency dependent. Many other components on the guitar also contribute to the overall tonal qualities of the guitar.
Resonant frequency responses of instruments also vary slightly depending on the portion of the note being played. The attack portion of a note, the sustain portion of a note, and the decay portion of a note may all exhibit slightly different resonance characteristics. There may also vary greatly between difference instruments.
Musical instruments typically have a range of notes that they can produce. The notes correspond to a range of fundamental frequencies that can be produced. These characteristic ranges of playable notes by the instrument of interest can also aid in identifying the instrument in a mixture of signals, such as in a recorded song. In addition to instruments that play specific notes are instruments that create less note-related signals. For example, a snare drum produces a broad array of harmonics that have little correlation to one another. These may be referred to herein as chaos harmonics. There is still a typical range of frequencies contained in the signal.
In addition to the range of fundamental frequencies an instrument creates, the overall frequency range of frequencies produced or generated by an instrument give characteristic clues as to the instrument creating the signal.
Instruments are often played in certain ways that give further clues as to what type of instrument is creating the notes or frequencies. Drums are played in rhythmic patterns, bass guitar notes also may be fairly regular and rhythmic in time. However, a bass guitar fundamental frequency overlaps few percussive instruments.
Research into analysis and processing of superimposed signals has been occurring for decades. The more common usage has been directed towards voice signal identification or removal, and noise reduction or elimination. Noise reduction and elimination has often revolved around statistical properties of noise, but still often utilizes first-step analysis techniques similar to that of voice processing. Voice processing has diverged into several pathways, including voice recognition systems. Voice recognition systems utilize analysis techniques that differ from the focus of the present patent, although the method of the present invention can be used for voice recognition. Voice enhancement, on the other hand, can be approached using two approaches. The first focuses on the characteristics of signals other than the one of interest. The second focuses on the characteristics of the signal itself. In either case, the information gathered is used for subsequent processing to either enhance or remove unwanted information.
One should keep in mind that the present invention includes multiple, in some cases alternative, steps in analysis of one to many signals included in the superimposed signal. It is also a goal of the present invention to retain the original information contained within the superimposed signals.
Maher, in xe2x80x9cAn Approach for the Separation of Voice in Composite Signalsxe2x80x9d, Ph. D. Thesis, 1989, Univ. of Illinois, approached the problem of automatically separating two musical signals recorded on the same recording track. Maher""s approach relies on a Short Time Fourier Transform (STFT) process developed by McAuley and Quatieri in 1986. Maher focuses on two signals with little or no overlap in fundamental frequencies. Where there is harmonic frequency collision or overlap, Maher describes three methods of separation: a) linear equations, b) analysis of beating components, and c) signal models, interpolation or templates. Maher outlines some related information in his thesis. Maher has noted that limitations in his approach exist as information overlaps in frequency or other xe2x80x9cnoisexe2x80x9d, whether desired or not, inhibits the algorithm employed.
Danisewicz and Quatieri, xe2x80x9cAn Approach to co-channel talker interference suppression using a sinusoidal model for speechxe2x80x9d, 1998, MIT Lincoln Laboratory Technical Report 794, approached speech separation using a representation of time-varying sinusoids and least-squared error estimation when two talkers were at nearly the same volume level.
Kyma-5 is a combination of hardware and software developed by Symbolic Sound. Kyma-5 is the latest software that is accelerated by the Capybara hardware platform. Kyma-5 is primarily a synthesis tool, but the inputs can be from an existing recorded sound files. It has real-time processing capabilities, but predominantly is a static-file processing tool. Kyma-5 is able to re-synthesize a sound or passage from a static file by analyzing its harmonics and applying a variety of synthesis algorithms, including additive synthesis in a purely linear, integer manner.
A further aspect of Kyma-5 is the ability to graphically select partials from a spectral display of the sound passage and apply processing. Kyma-5 approaches selection of the partials visually and identifies xe2x80x9cconnectedxe2x80x9d dots of the spectral display within frequency bands, not by harmonic ranking number. Harmonics can be selected if they fall within a manually set band.
Another method is implemented in a product called Ionizer, which is sold/produced by Arboretum Systems. One method starts by using a xe2x80x9cpre-analysisxe2x80x9d to obtain a spectrum of the noise contained in the signalxe2x80x94which is only characteristic of the noise. This is actually quite useful in audio systems, since tape hiss, recording player noise, hum, and buzz are recurrent types of noise. By taking a sound print, this can be used as a reference to create xe2x80x9canti-noisexe2x80x9d and subtract that (not necessarily directly) from the source signal. The part of this type of product that begins to seem similar is the usage of gated equalization in the passage within the Sound Design portion of the program. They implement a 512-band gated EQ, which can create very steep xe2x80x9cbrick wallxe2x80x9d filters to pull out individual harmonics or remove certain sonic elements. They implement a threshold feature that allows the creation of dynamic filters. But, yet again, the methods employed do not follow or track the fundamental frequency, and harmonic removal again must fall in a frequency band, which then does not track the entire passage for an instrument.
The present invention provides methods for calculating and determining the characteristic harmonic partial content of an instrument or audio or other signal from a single source when mixed in with a more complex signal. The present invention also provides a method for the removal or separation of such signal from the more complex waveform. Successive, iterative and/or recursive applications of the present invention allow for the complete or partial extraction of signal source signals contained within a complex/mixed signal, heretofore referred to as shredding.
The shredding process starts with the identification of unambiguous note sequences, sometimes of short duration, and the transfer of the energy packets which make up those segments from the original complex signal file to a unique individual note segment file. Each time a note segment is placed into the individual note segment file, it is removed from the master note segment file. This facilitates the identification and transfer of additional note segments.
The difficulty in attempting to remove one instrument""s or sources waveform from a co-existing signal (superimposed signal) lies in the fact that the energies of the partials or harmonics may have the same (or very close) frequency to that of another instrument. This is often referred to as a xe2x80x9ccollision of partialsxe2x80x9d. Thus, the amount of energy contributed by one instrument or source must be known such that the remaining energy may be left intact, i.e. the energy for that frequency contributed by one or more other instruments or sources. Thus, the focus of the present invention addresses methods by which the appropriate amount of energy can be attributed to the current instrument or source of interest.
The present invention is carried out using several steps, each of which can aid in the discernment and identification of an individual instrument or source. The methods are primarily carried out on digital recorded material in static form, which may be contained in Random Access Memory (RAM), non-volatile forms of memory, or on computer hard disk or other recorded media. It is envisioned that the methods may be employed in quasi real-time environments, dependent upon which method of the present invention is utilized. Quasi-real time refers to a minuscule delay of up to approximately 60 milliseconds (it is often described as about the duration of two frames in a motion-picture film).
In one step, a library of sounds is utilized to aid in the matching and identification of the sound source when possible. This library contains typical spectra for a sound for various note frequency ranges (i.e. low notes, middle notes, and high notes for that instrument or sound). Furthermore, each frequency range will also have a characteristic example for low, middle, and high range volumes. Interpolation functions for volume and frequency are used to cover the intermediate regions. The library further contains stretch constant information that provides the harmonic stretch factor for that instrument. The library also contains overall energy rise and energy decay rates, as well as long term decay rates for each harmonic for when the fundamental frequency of a note is known.
In another step, an energy file is utilized that allows the tracking of energy levels at specified time intervals for desired frequency widths for the purpose of analyzing the complex signal. Increases in energy are used to identify the beginning of notes. By analyzing the energies in the time period just preceding the beginning of the attack period, the notes that are still sounding (being sustained) can be isolated. The rate of decay for the harmonics may also be utilized to identify the note and instrument.
After an entire passage has been stepped through in time and all time periods have been marked, significant repeating rhythm patterns are identified which aid in the determination of instruments or signal source. The identified energy packets are subsequently removed from the master energy file and placed in an individual note energy file. The removal from the master energy file aids in the subsequent determination and identification of notes and instruments.
There are circumstances where an adequate library does not exist for a given sound source, due to the fact that either the sounds source is quite unique or insufficient information (i.e. library information) has not been collected. In this case, an iterative process is used to develop a fingerprint of the instruments in a recorded passage. The fingerprint is defined by three or more basic characteristics which include 1) the fundamental frequency, 2) the energy ratios of the harmonics with respect to the fundamental and/or other harmonics, and 3) the energy decay rate for each harmonic. The fingerprint can then be used as a template for isolating note sequences and identifying other notes produced by the same instrument. The process starts by using the lowest frequency available in a passage to begin developing the fingerprint. The method progresses to the next higher frequency available that is consistent with the fingerprint, and so on. This is continued until all unambiguous note sequences are identified and removed. At this point, identifiable notes that match the fingerprint have been removed or isolated to a separate energy file. There are likely to be many voids of notes played by a single instrument throughout the passage. An interactive routine permits a user to listen to the incomplete part, which helps check that appropriate items were shredded out. The process can be repeated as desired with the reduced energy file. New unambiguous note sequences will then be revealed in order to fill in previously unidentified note sequences and complete the previously shredded parts. The entire sequence is then repeated until all subsequent instruments are identified and shredded out.
In additional steps, the libraries are still utilized. However notes, defined as a fundamental frequency and the accompanying harmonic spectra, that are shredded are divided up into three categories. The first category, math harmonics, are notes that are mathematically related in nature and the adjacent harmonics contained therein will be separated in frequency by an amount that equals the fundamental frequency. The second category, math harmonics plus chaos harmonics, are notes with added nonlinear harmonics in the attack and/or sustain portion of the notes. An example is a plucked guitar note where the plucked harmonics (produced from the noise of the guitar pick striking the string) have little to do with the fundamental frequency. Another example is a snare drum, where the produced harmonic spectra includes frequencies related to the drum head, but also containing chaos harmonics that are produced from the snares on the bottom side of the drum. The third category, chaos harmonics, are notes with harmonic content that has nothing to do with a fundamental frequency. An example is the guttural sounds of speech produced by humans.
Software divides the recorded signal into each note by determining which areas have frequencies that rise and fall in energy together. It is also preprocessed to extract any xe2x80x9ceasy to findxe2x80x9d information. Next, the recording is recursively divided into the individual parts by utilizing further signatures related to harmonic content, resonance bands, frequency bands, overall frequency ranges, fundamental frequency ranges, and overall resonance band characteristics.
Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.