A vocal effect processor is a device that is capable of modifying an input vocal signal to change the sound of the voice. Pitch correction processors shift the pitch of an input vocal signal, usually to improve the intonation of the vocal signal such that it better matches the notes of a musical key or scale. Pitch correction processors can be classified as “non real-time” or “real-time.” Non real-time pitch correction processors are generally run as file-based software plug-ins and can use multi-pass processing to improve the quality of the processing. Real-time pitch correction processors operate with fast processing using minimal look-ahead such that the processed output voices are produced with very short delays of less than about 500 ms and preferably less than about 25 ms, making it practical for use during a live performance. Typically, a pitch correction processor will have at least a microphone connected to the input at which a monophonic signal is expected, and will produce a monophonic output signal. Pitch correction processors may also incorporate other vocal effects such as reverb and compression, for example.
Pitch correction is a method of correcting the intonation of an input audio signal to better match a desired target pitch that is musically correct. Pitch correction processors work by detecting the input pitch being sung by a performer, determining the desired output note, and then shifting the input signal such that the output signal pitch is closer to the desired note. One of the most important aspects of all pitch correction systems is the mapping between the input pitch and the desired target pitch. In some systems, the musically correct or target pitch is known at every instant in time. For example, when pitch correcting to a known guide track or channel, such as the melody notes in a MIDI file, each target note is known in advance. Therefore, the mapping simply reduces to choosing the target pitch regardless of the input pitch. However, in most situations, the intended target pitch is not known in advance and therefore must be inferred based on the input notes and possibly other information, such as a predetermined key and scale, for example.
This disclosure provides representative embodiments for music corresponding with the western 12-tone scale, although it will be clear to those of ordinary skill in the art that this description can be adapted to any musical system or scale that defines discrete notes. In some systems, the target scale is assumed to be a chromatic scale that encompasses all 12 tones in a scale according to a predetermined scale reference frequency such as A=440 Hz. In other systems, the target or predefined scale may include a subset of the available tones. For example, a C♯-major scale that includes a predefined subset of seven notes may be used. In either case, the vocal effect processor needs to include a mapping between all the possible input pitches, and the discrete set of desired output notes.
There are several problems with the existing state of the art in pitch correction. For example, when a chromatic scale is used and the singer misses the desired target note by more than half a semitone, the wrong target note will generally be selected. Also, when a singer is using vibrato or some other pitch effect that has a large pitch deviation, the correction may result in the selected output note jumping or oscillating between two notes. Using a scale with fewer output notes than a chromatic scale, such as the seven notes in a major scale, can help to alleviate both of these problems. However, this often results in another major problem: many songs have short sections in which the localized key or tonal center is different from the global key of the song. For example, an A-major chord, which includes the notes of A, C♯, and E may be played during a song that is globally in the key of G-major, which does not include C♯. In this case, the melody may include a note (C♯) that is not part of the global key (G-major), and therefore will not be selected by the pitch correction input to output mapping.
Another common complaint about the existing state of the art in pitch correction is the fact that, mostly as a consequence of the pitch detection and pitch shifting operations, there is always a time delay between the input audio and output audio of the pitch correction processor. In existing state of the art real-time pitch correction systems, this delay is approximately 20 ms. Singing with delays greater than about 10 ms can be difficult for many people, as the delay is similar to an echo that is very distracting to the performer.
Systems and methods according to embodiments of the present disclosure provide pitch correction while overcoming various shortcomings of previous strategies. In various embodiments, systems and methods for pitch correction dynamically adapt a mapping between detected input notes and corresponding corrected output notes. Note boundaries may be dynamically adjusted based on notes detected in an input vocal signal and/or an input accompaniment signal. The pitch of the input vocal note may then be adjusted to match a mapped output note. In various embodiments, delay of pitch shifting is dynamically adjusted in response to detecting a stable voiced note to reduce delay for note onsets and increase delay for stable notes, including voiced notes with vibrato.
In one embodiment, a system or method for processing a vocal signal and a non-vocal signal include detecting vocal input notes in the vocal signal, generating a vocal input note histogram based on number of occurrences of each detected vocal input note, detecting non-vocal input notes in the non-vocal signal, generating a non-vocal note histogram based on number of occurrences of each detected non-vocal input note, combining the vocal note histogram and non-vocal note histogram to generate a combined note histogram, mapping the vocal input notes to corresponding vocal output notes based on associated upper and lower note boundaries, shifting pitch of the vocal input notes to a pitch associated with the corresponding vocal output notes, adjusting the upper and/or lower note boundaries in response to the combined note histogram, determining if a pitch of a vocal input note is stable, and adjusting delay of pitch shifting based on whether the pitch of the vocal input note is stable.
In one embodiment, a system for adjusting pitch of an audio signal includes a first input configured to receive a vocal signal, a second input configured to receive a non-vocal signal, an output configured to provide a pitch-adjusted vocal signal, and a processor in communication with the first and second inputs and the output. The processor executes instructions stored in a computer readable storage device to detect input vocal notes in the vocal signal and input non-vocal notes in the non-vocal signal, map the input vocal notes to output vocal notes, each output vocal note having an associated upper note boundary and lower note boundary, modify at least one of the upper note boundary and the lower note boundary of at least one output note in response to previously received input vocal notes and input non-vocal notes, shift the pitch of the vocal signal to substantially match an output note pitch of a corresponding output vocal note, and generate a signal on the output corresponding to the shifted pitch vocal signal. The processor may be further configured to dynamically modify a delay for shifting the pitch in response to stability of an input vocal note. Various embodiments may include adjusting one or more note boundaries based on a likelihood of an associated note occurring. The likelihood of an associated note occurring may be based on previously identified notes, which may be reflected in corresponding note histograms, or a table of relative likelihood of occurrences, for example.
Embodiments according to the present disclosure may provide various advantages. For example, systems and methods according to the present disclosure dynamically adapt input to output mapping over the course of a song to accommodate local key changes or shifts in tonal center from a global key without requiring user input or a guide track. This results in musically correct output notes while accommodating an occasional output note that is not within the global key or scale, i.e. not diatonic.