Many kinds of interfaces have been designed to allow a user to interact with sound production circuitry. Some are mechanical devices built to resemble traditional musical instruments such as a guitar or saxophone. Others are unique to electronic instruments, and have all manner of fanciful constructions. User interfaces include circuitry to generate electronic signals called control signals, responsive to user interaction gestures. Additional circuitry generates electrical signals that produce sound responsive to these control signals. Control signals include discrete events and continuously varying streams of data.
The continuous data is referred to as “control rate data”. Control rate data is commonly generated responsive to continuous user interaction gestures including for example, manipulation of control operators such as wheels and joysticks, the generated control rate data representing displacement of such operators. Control rate data may also be generated automatically responsive to a discrete user interaction gesture such as depression of a key. Further, control rate data may be generated by processing functions responsive to input control rate data.
In the present invention a control rate data is any stream of data that has a frequency of vibration that is below that which may be perceived as an audio tone. The frequency may be determined using the period of a control rate signal, as represented by the start point and end point, or by inflection points of a user gesture, or by zero crossing points of a repeating wave form. In general, control rate signals operate in the same frequency range as human gestures.
This data may be generated only when it changes value, or at a constant sample rate determined by a clock. Thus the functions described herein may be called in response to data representing a change from previously received data, or at a constant clock rate. When control rate data is sampled from physical controllers, a clock rate higher than the rate of change of the data must be used to obtain useful accuracy. The sample rate of control rate signals is usually in the range 20–10,000 samples per second, but may be higher or lower. The periodic frequency of audio tones is in about the same range as the sampling rate of control rate signals. This is significant because control rate signals are typically used to vary the pitch, volume or timbre of audio tones. Pitch is related to the periodic frequency of zero crossing points of an audio wave form. Volume is related to the peak amplitude of an audio wave form. Timbre is related to inflection points or wave shape of an audio wave form, and may also involve other aspects of a tone, including noise content, changes over time, and salient characteristics. So each data point of a control rate signal corresponds roughly to a single wave form of an audio tone. Control rate signals may have a higher sample rate, such as those generally used to represent audio rate data. However, the rate at which control rate data changes value is so much lower than that of audio rate data, that such high sampling rates are generally considered to be wasteful of computer processing power. P In addition to continuous control rate data, discrete data values representing user interaction gestures are generally used to represent “notes”. The term note as used herein refers to any audio signal, including signals having primarily noise or varying pitch content, as well as typical musical note audio signals having perceivable discrete pitch content. The wide availability of MIDI based audio equipment has made it commonplace for piano style keyboards and other interfaces designed to facilitate playing of musical tones, to be used to play sounds of any description.
User interaction gestures representing selection of audio signals by, for example, depressing the keys of a piano style keyboard are referred to herein as note selections. Note selections may also be used to determine the value of one parameter of an audio signal. For example, MIDI note numbers selected with a piano style keyboard typically represent a selected pitch of an audio signal having perceivable discrete pitch content.
Thus in the present invention, note selections refer to discrete selections made by user interaction gestures that intersect a fixed location on a continuous line or plane. Examples are striking the keys on a piano style keyboard, or selecting a position along a fretboard or continuous membrane. Such a selection may be seen as the endpoint of a continuous user gesture that is activated perpendicular to the line or plane. Such note selections occur at what is defined here as “interaction rate” and generate “interaction rate data”. Interaction rate data occurs at roughly the same rate as zero crossing, peak amplitudes, or inflection points of control rate signals. So in the present invention, interaction rate data may also be generated responsive to continuous user interaction gestures such as manipulation of a wheel or lever, when crossing position, velocity, or acceleration thresholds, by starting, stopping or changing direction of motion, or by intersection with a time threshold. All gestures performed by a user to interact with an electronic audio system may thus generate and be represented by interaction rate data.
In the present invention, additional interaction rate data may be synthesized by processing functions, such as counting, multiplexing, indexing, and combining interaction rate data using logic operations. Advantageously, logic functions referred to as latches may be used to store results of previous processing of interaction rate data in order to create interaction rate data hierarchies that represent the hierarchical decision process a musician uses to creates musical phrases and melodies. This hierarchical interaction rate data, generated by a hierarchy of conditional latches encoded in the interface electronics, provides a means of changing the roles played by the physical devices used to interact with the sound production circuitry. This reflects the way a musician temporarily changes the position and/or orientation of arms, hands and fingers in order to interact with an instrument in different ways.
The user interaction gestures a musician employs to play music may be separated into “selection gestures”, “activation gestures” and “modulation gestures”. For example a note is selected on a violin using one hand, and then the note is activated using the other. In the present invention, the action of bowing may be represented by interaction rate event data for start, change of direction and stop points, and by control rate signals connecting these points, representing velocity or position of the bow. Once the note is activated, additional notes may be both selected and activated by performing selection gestures with the fingerboard.
Typical modulation gestures include vibrato, portamento slides, and other pitch variations, as well as subtle variations in volume and/or timbre of a sounding note. Modulation gestures may likewise be represented by interaction rate event data and connecting control rate data.
In general, selection gestures refer to note selection and activation gestures refer to note activation. Modulation gestures refer to modification of activated notes. However, each of these three types may also be subdivided into selection, activation and modulation gestures. That is, just as notes are selected, activated and modulated, gestures are also selected, activated, and modified, each of which gesture may be represented by interaction rate data.
For example, notes may be activated on a guitar by picking with an up or down stroke, or by bowing or striking a string. So an activation gesture itself is selected using a gesture that positions the hand to perform the selected gesture. Then the activation gesture is itself activated, which activates the note.
There may also be more than one type of note selection gesture. Notes may be selected on a guitar by fingering a string, or by releasing a string leaving the string open, or by fingering and then bending a string prior to activation. Note selection also consists of three parameters, representing three ranges of pitch resolution, which may be selected separately. These are base note selection (open string), note increment selection (fret), and fine note selection (bend) all of which may be done prior to activation. Each of these three ranges of pitch selections may be made separately or together, before or after activation of a note. Volume and timbre may also be selected and varied in similar ranges.
Some of the many types of modulation gestures have been mentioned. However, sometimes a gesture is not so easily classified. For example, after a note is activated, a new note may be selected by hammering on a new fret or fingerboard position. This is a selection gesture that is also a modulation gesture of a sounding note. In a sense, it is also an activation gesture, since the action of hammering the string down on a fret produces perturbations of the sound that are distinguishable aurally as a characteristic of a new note activation. The new note selected by a hammer-on gesture may then be activated anew with a picking gesture, which is a commonly used technique.
So there is a blurring of the distinction between selection, modulation, and activation gestures. Each gesture may serve more than one purpose, each of which is a contributing factor to the listener's perception of its meaning within a musical context. Gestures may blend one into another. This is not just sloppy musicianship. It is intentional and very effective for creating continuity within a musical “phrase”. A musical phrase may thus be represented by a series of interaction rate event data representing user selection, activation and modulation gestures, each of which may overlap to create a sense of continuity.
The MIDI data protocol is a widely accepted standard specification for representing interaction rate data typically generated by playing notes on a piano-style keyboard, known as note-ons and note-offs. The MIDI specification also specifies continuous control rate data that represents manipulating simple operators such as joysticks, wheels and levers.
When playing a keyboard based MIDI synthesizer, as when playing a piano, notes are selected by a transverse gesture along the keyboard and activated by a motion perpendicular to the selected key. However MIDI note-on data consists of a single packet that represents simultaneous selection and activation of notes. That is, both selection and activation events are combined into a MIDI note-on event. In the present invention a number of advantages are obtained by separating note selection and activation.
Some MIDI wind controllers, such as a Yamaha WX7 internally specify separate selection and activation data, used to generate MIDI note-on and note-off data. Similarly, guitar controllers may also use separate selection and activation data and also channelization of the strings, so gestures may be performed separately for each note activated by each string. Janosy, ICMC 1994, specifies separating MIDI note selection and activation data for the purpose of simulating guitar playing on a piano style keyboard. Janosy also specifies a chord mode channel for playing chords selected by one finger, and also a kind of solo mode for playing lines, that includes recognition of modulation gestures. Further, Janosy identifies a number of typical gesture types used in guitar playing.
However the above prior art instruments fails to specify means of selecting and activating gestures, or significantly, for blending gestures to create phrases. They also fail to recognize that notes may be advantageously channelized according to activation gesture or modulation gesture, as well as selection gesture, or that these represent performance modes that may themselves be selected and activated by the user. Thus, advantages may be obtained by separating selection and activation of notes, gestures and performance modes, and by making these selections and activations available to the user via an Interactive Performance Interface, as will be seen presently.
Besides the problem of MIDI note-ons and note-offs, it is a well-known limitation of MIDI that continuous control rate data generated by specified control operators modulates all sounding notes at the same time. This is rarely the case in a real musical performance, and generally makes for uninteresting sounding music. A new standard called ZIPI was developed, but not commercialized, which is based on the model of an orchestra rather than a piano style keyboard. It specifies that control rate data may be directed to individual notes so that each note may be modulated separately. However for an individual musician to modulate each note separately and distinctly from other notes, he or she must perform separate modulation gestures for each note, a difficult if not impossible task for polyphonic music. Otherwise, an entire orchestra of musicians is required, each playing a separate electronic instrument playing and modulating one note of a polyphonic ZIPI synthesizer.
One enhancement to the MIDI specification that addresses this limitation is called “Polyphonic Aftertouch”. This provides a type of MIDI control rate data that is applied to each note individually. It is designed so that pressure pads under each note of a piano style keyboard may be used to modulate that note discretely. However, this arrangement requires that each modulation gesture be started after a note is activated and resolved before the note is deactivated, which only occasionally occurs in traditional music performance. Besides that, modulation gestures cannot be applied selectively to groups of notes, or across a sequence of notes using Polyphonic Aftertouch.
Another enhancement to MIDI is called MONO mode. This allows for only one note at a time to sound. If a second note is played before releasing the previous note, the second note continues the first note with only a change in pitch. However, new notes played this way in MONO mode simply start a new notes without the attack portion of the pre-programmed control envelope. If a note is played after all previous notes have been released, the new note is re-attacked. Such an arrangement is ineffective for creating realistic or varied transitions between notes.
This problem is partially addressed in U.S. Pat. No. 5,216,189, to Kato (1993), which specifies selectable preset curves which can be used to create note transitions. Unfortunately this requires a fingering scheme commonly known as “fingered portamento” or “fingered legato” that requires that a note be held down while additional notes are selected in order to effect simulated note transitions. This requirement is awkward at best and does not allow for lateral displacement of the hand along the keyboard, which is a staple piano keyboard playing technique.
In U.S. Pat. No. 6,316,710, Lindemann, a variety of pre-composed audio files are provided that may be activated to create realistic sounding note transitions. Lindemann also provides a “sound segment sequencer”, that processes user interaction gestures and makes decisions about what audio file to play. Unfortunately, Lindemann only provides the example similar to the fingered portamento scheme described above, wherein a MIDI style note-on signal such as from a wind controller be maintained in order to create slur transitions. The above examples fail to recognize the importance of different performance modes for interacting with electronic instruments, as represented by different hierarchies of selection, activation and modulation gestures, including selection, activation and possibly modulation gestures for the performance modes themselves. This is because these inventions fail to account for the role different modes of operation of arms, hands and fingers play in a musical performance.
In contrast to the prior art, the inventor has discovered that a flexible, easy to use performance interface for an audio system can be implemented by electronically modeling the decisions and actions a musician makes in performance. This discovery stems from the inventor's previous discovery that audio synthesis systems work by mirroring perceptual modes used to hear and process audio signals. Which insight can be extended to physical devices such as loudspeakers that mirror human ears, and control devices that mirror human limbs. The inventor further extended this theory to include the simulation of muscle activation required when interacting with a musical instrument, which was the subject of U.S. Pat. No. 6,066,794, Longo. In the present invention, the inventor further extends the analogy of reflection to include the operation of joints and limbs via decisions a musician-user makes, to create a flexible performance interface.
Traditional musical instruments necessarily reflect the modes of movement of the human body. But the interface of an acoustic instrument is limited because it must support an internally resonant acoustical system. This is why acoustical instruments usually take years to learn to play well. For example, a guitar fretboard is arranged so a musician's fingers may be positioned transversely across the strings, in order to select chord voicings. The musician's hand can also be rotated, so the fingers fall longitudinally in succession along the frets of a single string. These are two performance modes for a guitar. However these gestures are notoriously difficult for a beginning guitarist to perform, because they also require interacting with strings stretched taught in a manner designed to cause the guitar body to resonate.
However performance modes such as those described above can be modeled electronically using circuits that represent the action of the fingers and hand. The decision to use one performance mode or another can also be modeled, and the option to switch from one to another provided to the performer via an electronic performance interface. Because such an electronic interface does not suffer from the limitations imposed by the necessity of manipulating a resonant acoustical system, both the selection of modes and performance actions may be made available to the user via simple operators that are comfortable to use.
Control envelopes are known in the art. They were invented by early audio synthesizer pioneers to represent a series of gestures a musician typically performs to activate, sustain and release a note. In a conventional electronic instrument, control envelopes are triggered together with an audio signal, both responsive to a received note-on event, and then they perform automatic modulations to the audio signal. Control envelopes typically consist of a series of “breakpoints” and “line segments” that connect the breakpoints. Lamentably, automatically generated control envelopes always sound the same. Some prior art instruments specify envelopes for which minor variations can be introduced by several means known in the art, but these are generally limited to a single variation of a single segment of a control envelope.
Control envelopes are sometimes referred to in the art as wave forms. Envelope wave forms generally have a longer period than control rate signals such as low frequency oscillators, which further modulate an audio signal under control of an envelope. Envelope wave forms represent “human actions” consisting of a series of gestures combined to produce a given result. For example, the action of throwing a baseball consists of a combined series of sequential and simultaneous gestures of many joints in the human body, mainly the shoulder, arm, elbow, wrist and hand. Each of the separate gestures have a start point and an end point, each of which may be represented by interaction rate data. These may also be seen as data points of an interaction rate signal representing the human action of throwing a baseball. The breakpoints of a control envelope also occur at interaction rate, and thus may also be thought of as an interaction rate signal.
The breakpoints and line segments of a control envelope also have a hierarchical structure. This structure can be represented by a hierarchical arrangement of conditional latches. In the present invention, the breakpoints and an unlimited number and variety of line segments of control envelopes can be selected, activated and modulated in real time, using a limited number of easily manipulated control operators. This can be accomplished because latches remember the result of combinations of gestures. Once a latch is activated and interaction rate data synthesized, it becomes possible for the same operators used to activate the latch to then perform other functions enabled by the latch. In addition, some operators, such as the keys of a typical piano style keyboard, automatically return to a starting point when released by the user, thus changing the state of the operator. In the present invention, once a latch has been activated, the user may release such an operator, while the latch remembers the result of the previous gesture, thus freeing his or her hand to manipulate other operators.
This gives the musician precise control over continuous control functions usually performed automatically by control envelopes. In addition, the functions of envelopes may be expanded, for example by using Longo's gesture synthesis functions, to generate more interesting sounding control rate interpolations between breakpoints than provided for by conventional envelope line segments. Further, synthesized interaction rate data may be used to activate additional audio rate signals, note sequences and other effects. Such a hierarchical latch activated structure is referred to in the present invention as an “interactive control envelope”.
Thus the present invention provides a performance interface architecture for an audio synthesizer. The provided interface can simulate musical decisions as well as the performance actions of musicians using interactive performance modes. It suffers from neither the physical difficulties imposed by the construction of traditional instruments, nor from the narrow performance options provided by prior art electronic interface circuitry. Since the present invention is implemented in the electronic domain, it is ideally flexible for representing the kinds intricate, overlapping musical relationships between notes and gestures found in traditional music. The advantages obtained are empowering for electronic musicians and the musical results that can be achieved are startling.