The practice of algorithmic composition has a long history ranging from mechanical devices (such as wind chimes and automata), through musical dice games (Muisicalisches Wurfelschpiel, attributed to Mozart among others), mathematical, statistical, random and stochastic composition (e.g. the works of Iannis Xenakis) to computational software code and programs such as Cybernetic Composer by Charles Ames (see Ames, Charles. 1987. “Automated Composition in Retrospect: 1956-1986.” Leonardo 20/2; 169-185; 1989. “The Markov Process as a Compositional Model: A Survey and Tutorial.” Leonardo 22/2: 175-187.). These antecedents have been extensively surveyed in Computers and Musical Style (Cope 1991. Computers and Musical Style. Madison, Wis.: A-R Editions, pp 1-18).
While there are various patents that have aspects related to the topic of music composition, the inventor's own prior work is of more significance relative to the present invention. In particular, the inventors own work, commonly known as the “Emmy Algorithm” or “Emmy”, is a software package that uses recombinant algorithmic composition and has been taught through the publication of volumes of work, including books and articles, by David Cope, including:
1991a. Computers and Musical Style. Madison, Wis.: A-R Editions;
1991b. “Recombinant Music.” Computer Music Journal 24/7: 22-28;
1992. “Computer Modeling of Musical Intelligence in EMI.” Computer Music Journal 16/2: 69-83;
1996. Experiments in Musical Intelligence. Madison, Wis.: A-R Editions;
2000. The Algorithmic Composer. Madison, Wis.: A-R Editions;
2001. Virtual Music. Cambridge, Mass.: MIT Press;
2003. “Computer Analysis of Musical Allusions.” Computer Music Journal 27/1: 11-28;
2006. Computer Models of Musical Creativity. Cambridge, Mass.: MIT Press;
The process used within the Emmy software has been referred to throughout the inventor's work generically as Experiments in Musical Intelligence. The fundamental algorithmic sequence of the Emmy software can be represented by the logic flow illustrated in FIG. 1, which shows, from a music database 100, operations that include pattern matching step 110, segmentation step 120, hierarchical analysis step 130, non-linear recombination step 140, which result in the output 150.
The music database 100 is essentially the embodiment of a musical composition, or musical performance, in a tangible or legible form, format, language, or code that can be interpreted and executed by devices such as a computer, a musical instrument with a digital interface, a sound synthesizer, digital-to-analog audio reproduction system, or any combination of electrical, electromechanical, or mechanical musical devices. For instance, a musical score is, in and of itself a musical database of a composition, and a phonograph recording is an analog musical database of a performance. In an Emmy database, the groupings of notes in a musical phrase, or more precisely the “events” in a musical phrase, are assigned numerical values according to their pitch, duration, location in the work, voice, amplitude and/or other sonic and temporal qualities which characterize the notes. A single event is the grouping of notes which constitute a single beat in a musical work. Collections of numerical values of notes are compiled to represent successively longer measures, phrases, sections, and so forth. These compilations are called event lists, and are susceptible to processing by digital list-processing computer applications (such as the computer language known as LISP, short for List Processor).
Event lists describe the various attributes of each note with a single list of parameters of at least five separate but related elements, as follows:
The first element of the event list is the on-time, or the time elapsed between the beginning of the work and the initiation of the note. On-times are assigned numerical values in Emmy based on a standard metric of 1,000 ticks per second, which is usually equated with the length of a quarter note. On-times are relative, not absolute. As with printed music, the actual on-time of a pitch is determined by a combination of on-time (location in the score) and tempo (pace of playing). For example, an on-time of 1,000 could begin 1 second after 0 with a tempo of m.m. 60, 2 seconds after 0 with a tempo of m.m. 30, half a second after 0 with a tempo of m.m. 120, and so forth. Events describe only sound events (note ons and note offs), not silences or rests, relieving databases of vast amounts of unnecessary data. Silences, or rests, are represented by default as the result of a lack of events.
The second entry of the event list is pitch. In Emmy, pitch is assigned a numerical value using the established Musical Instrument Digital Interface (“MIDI”) standard, with middle C (520 cycles per second) equal to MIDI note number 60. Additions and subtractions of 12 produce C in various octaves, and additions and subtractions of 1 create half steps. Thus, the numerical sequence 60-62-64-45-67-69-71-72 represents the C major scale with intervening numbers (61, 63, 66, 68, 70) producing chromaticism to that key.
The third entry of the event list is duration. Duration, as with on-time, is figured to a quarter note's equaling 1,000 ticks; relative durations are figured from that standard. The duration of an event implies the MIDI note off-time, which can be independently figured as the addition of the on-time plus the duration. Thus, an event with an on-time of 6,000 and a duration of 1,000 has an off-time of 7,000. Duration, as with on-time, is relative, being a factor of its value within the current tempo.
The fourth entry of the event list is channel number. The channel numbers indicate the original voice separation of the music entered into that database (e.g.: soprano, alto, tenor, bass; or, trumpet, saxophone, guitar, drums, etc.). Channel numbers are used to indicate the voice from which events were harvested or will be assigned for performance in the score of the new composition or, perhaps, for performance by a digitally enabled instrument (e.g. an instrument compatible with the industry-standard Musical Instrument Digital Interface). Channel numbers are theoretically unlimited, but in practice, 64 channels are sufficient for most music.
The fifth entry of the event list represents dynamics. Dynamics are based on 0 equaling silence and 127 equaling fortissimo, with the numbers between these values being relative to these extremes.
Numbering systems, while logical are arbitrary and therefore many alternative numbering systems are possible. For instance, a base metric of 10,000 ticks per second could be used for event duration, and a scale of 0 for silence to 254 fortissimo. Additional entries, a sixth, seventh and so on, into databases may be made as needed for other musical qualities and quantities pertaining to musical notes or events, such as tremolo, aftertouch, and so forth. Events are open-ended, that is, one may add any desired parameter to the end of event lists with no ill effects on the first five elements. Events are compiled into collections of larger phrase, section, or work, lists. Events are typically ordered sequentially (i.e. beat one, then beat two, and so forth) to make visual event reading simple and logical. Databases can be created by manually translating scores into event lists, or by software that automatically scans printed scores (sheet music) and translates them into events lists, by performing the work through a digitally-enabled instrument (sometimes referred to as “step entry”) or by software that automatically analyzes performed music and translates it into events lists, or any combination of these techniques.
As with any large collection of data, in order for a database incorporating many musical events to be manageable it is beneficial to clarify, or make the data homogeneous in ways that preserve its essential characteristics and variety while facilitating analysis and processing. For instance, all works may be transposed into the same key signature, tempo, and so forth, without radically altering their distinctive melodic and harmonic characteristics and intervalic relationships between their notes. The precise format of the list will be determined by the type of application which will be used to process the data.
The Emmy algorithm assumes that every work of music contains an inherent set of instructions, or rules, for creating different but highly-related replications of itself—an assumption which is generally agreed to by musicologists. These instructions, when analyzed and interpreted correctly, lead to important discoveries about this music's structure as well as providing a key to producing new instances of music that are stylistically-faithful to it.
The pattern matching step 110 is the process of comparing events lists representing musical works or phrases in the musical database to discover what elements they have in common. Highly recurrent patterns in a single work typically represent thematic material, such as a particular melodic line and associated chord progression. However, patterns which recur in more than one work can be construed as the essence of the style of a particular composer or genre. Style is inherent in recurrent patterns of the relationships between the musical events, in more than one work. The primary constituents of these patterns are the quantities and qualities captured and represented in the musical database event lists—essentially pitch, duration, and temporal location in the work—although other factors such as dynamics and timbre may come into play. Patterns may be discerned in vertical, simultaneous relationships, such as harmony, horizontal, time-based relationships, such as melody, as well as amplitude-based relationships (dynamics) and timbral relationships. Patterns might be identical, almost identical, identical but reversed, identical but inverted, similar but not identical, and so forth. The Emmy algorithm searches the databases for such patterns using controllers that either restrict the search to detecting patterns that are highly similar, or widen the search to detect patterns that are loosely similar. The essence of this process is to reiteratively select the event list of differing portions of the music and look for other instances of the same, or similar, events lists elsewhere in the database, and to compile catalogues of matching events lists, ranking them by frequency of occurrence, type, and degree of similarity. The objective of this search, whether the pattern-matching net is cast tightly or widely, is to detect patterns that characterize the commonalities, or “style,” of the bodies of music in the musical databases.
Matches that are long in duration and loosely similar, for instance, characterize the form which the works in the database share. In a rudimentary example, the basic twelve-bar blues form—(AAB)—would be discovered and registered as a match in a database containing several blues. Formal patterns, generally of long duration, will often have widely varied content within the components of the pattern. The pattern matching controllers are therefore set to discover and compare larger musical structures while de-emphasizing or ignoring the details within the form. (By analogy, a poetry-form pattern matcher would discover the sonnet form by finding commonalities in the number of lines, meter, and rhyming scheme, while ignoring the words. Even if some sonnets in the database were in English and others were in Italian, the form uniting them could be discovered.)
Matched patterns of shorter duration, such as beats, measures and short phrases, are also sought and catalogued. These commonalities are denominated “signatures” in Emmy. In the recomposition process, signatures are preserved and serve to ensure that stylistic qualities are inherent in the musical output.
In the Emmy algorithm, superficial, thematic material specific to a particular work should not be mistaken for deeper commonalities shared by many works by the same composer, or in the same genre. For instance the “di-di-di-dah” motif of Beethoven's Fifth Symphony is a thematic component specific to that work and is not a signature that is found in very many or all of Beethoven's work. The pattern-matching controllers in Emmy can be adjusted to reject thematic material as superficial and irrelevant to the discovery of signatures. This is achieved by rejecting matches that occur with relatively high frequency in a single work but occur with relatively low frequency, or are entirely absent, in other works.
The essential outcomes of the pattern-matching step 110 are two-fold: long forms present in the source material, particularly forms of musical-phrase length and larger, are identified for use as templates for future recomposition; and stylistic signatures are captured so that instances of them can be protected (not broken apart) and re-implanted in the composition process.
In order to recombine music it must, self-evidently, be broken into constituent elements first. This process is referred to in the Emmy software as segmentation step 120. Segments, typically, consist of beats—the groupings of notes which correspond with one beat in the music. However, segmentation of existing musical works into smaller components, and haphazard recombination of them into new orders, would produce musical gibberish, as would fragmenting written language sentences into words and haphazardly recombining the words without regard to grammar (syntax) or meaning (semantics). Although segmentation in the Emmy software is fundamentally straightforward—the identification of each beat and its conversion into an event list—each segment will become progressively more complex as contextual analysis is applied to it. Each beat-segment will accumulate and carry with it at least the following information: the destination note for each note in the beat (i.e. the note in the corresponding voice which follows it in the original work); the grouping of beats, or phrase, to which it belongs; the location of the phrase within the work; its SPEAC value (see below); and whether it is part of a signature that will be protected and not broken apart in the re-composition process.
While the pattern matching step 110 analyzes form from an essentially syntactical point of reference, hierarchical, or SPEAC, analysis step 130 investigates the semantic structure of music and provides tools for ensuring that when music is recombined, syntactically correct music is also semantically intelligible.
The differentiation of two or more apparently identical but functionally different musical events by analyzing the context in which they occur is extremely important in the Emmy algorithm. The musical function of a note, or chord, in a piece of music depends on its context, particularly the musical interval between the notes or chords that precede or follow it. This may not be intuitively obvious to a non-musician, but can be illustrated by an analogy. In spoken language, homonyms (same-spelled and spoken words) can have quite different functions and meanings, for instance in the sentence “I saw the saw saw.” The word “saw” appears three times in this sentence, with each appearance having a different meaning and making a different syntactic contribution (subject verb, object noun, object verb, etc.) and semantic meaning (because we know that saws cannot see, we infer that the final appearance cannot be a part of the verb “to see” and must therefore refer to the act of sawing). Only the context distinguishes each word's true function and meaning. The same is may be said of music.
Tonal-music leading tones provide an example of how hierarchical analysis differentiates between apparently identical functional motions in music. The leading-tone note in the key of C Major (B), for example, strongly leans toward the tonic note when found in dominant, dominant-seventh, and leading-tone harmonies. However, the same leading-tone note appearing as the fifth of the mediant triad does not necessarily lean toward the tonic note (C), but in fact often moves more naturally elsewhere—the submediant note (A), for example. Thus, the same leading-tone note can be analyzed differently depending on its context. This insight provides a very important foundation for Emmy approaches to structural analysis.
For these reasons the Emmy software adopts a hierarchical approach to musical analysis, which is based on a combination of musical tension and musical context that are analyzed, evaluated and assigned a numerical weighting. This weighting combination closely parallels the manner in which one hears music, almost regardless of its style or period of composition, and hence represents the core of the analysis component of Emmy composing programs The hierarchical approach uses a process that goes by the acronym SPEAC, the acronym being based on the identifiers—Statement (S), Preparation (P), Extension (E), Antecedent (A), and Consequent (C)—which will be assigned to events and groupings of events. SPEAC analysis also parses these selected groupings of events to extract information about their role in increasingly large musical structures, from beats to measures, to phrases, sections, and even to whole works. While traditional tonal functions provide analysis of surface detail, the SPEAC approach provides deeper insights into musical structure. In other words, SPEAC derives musical meanings from context as well as from content. SPEAC identifiers function in the following ways:
S=Statement; is stable—a declaration of material or ideas. Statements typically precede or follow any SPEAC function.
P=Preparation; is unstable—an introductory gesture. Preparations precede any SPEAC function though more typically occur prior to statements and antecedents.
E=Extension; is stable—a continuation of material or ideas. Extensions usually follow statements but can follow any SPEAC function.
A=Antecedent; is very unstable—requires a consequent function. Antecedents typically precede consequents.
C=Consequent; is very stable—results in consequent gestures. Consequents must be preceded directly or indirectly (with intervening extensions) by antecedents.
SPEAC identifier assignments follow an A-P-E-S-C stability order with the most unstable identifier to the left, and the most stable identifier to the right. Therefore, A and P require resolution while E, S, and C do not. Thus, progressions of identifiers such as PSEAC and SEA are musically logical, progressions such as AEPS and SAPC, while not impossible, are less logical. David Cope: “Algorithmic Composer” p. 194 provides an example of SPEAC analysis as applied to a Bach Chorale.
While these approaches to defining roots, tensions, and groupings seem logical in principle, methods of converting these roots and tensions into numerical values for contextual comparison and analysis is not obvious. To achieve conversion, the Emmy software uses an empirically derived formula:f(x)=y+(cos((−1*z)+x/z))/2
where x is the pitch-class interval, y represents the y coordinate, and z is a constant. This formula roughly accounts for the primary intervals (seconds, thirds, and fourths).
Secondary intervals (sixths and sevenths as inversions of thirds and seconds respectively) then approximately mirror the primary intervals, with the fifth is treated uniquely as it has very little tension. Intervals greater than an octave have slightly less (0.02) tension than their related less-than-octave-separated equivalents, because of their octave separation. When this formula is applied to the intervals within several chords in a particular work, the relative totals produced by the formula will indicate what the SPEAC role of each chord is. For instance, a chord which produced a total of 0.5 would be the antecedent, “A,” in the context of a preceding chord with a value of 0.2 and a succeeding chord with a value of 0.3; however, the identical chord with its value of 0.5 would be a consequent, “C,” in the context of a preceding 0.8 and a succeeding 0.4. While the intervals in the event determine its fixed value, or weighting, the context in which the event occurs determine its relative position in the SPEAC hierarchy—both in the original works in the database and in recombined works.
After SPEAC hierarchical analysis step 130 has been performed, every event and grouping of events, will carry with it its SPEAC identifiers and weightings, which will be essential in order to accomplish musically-logical, structurally sound, and context-sensitive re-location of events and event-groupings in the recombination process.
The Emmy software employs a recombination step 140, which is made possible by prior pattern-matching step 110, segmentation step 120, and prior SPEAC hierarchical analysis step 130 of the database 100. Non-linear recombination step 140 is the compositional process that synthesizes the results of the pattern-matching step 110 (form, and signature detection) and the hierarchical SPEAC analysis step 130 (context-sensitive, structural analysis) components of the Emmy algorithm.
Tonal music follows well-known principles governing pitch (notably major and minor scale derivation and complementary chromaticism), melody (primarily stepwise motion with leaps often followed by stepwise contrary motion), harmony (having prescribed functions and syntax), voice-leading (mostly stepwise motion with voice independence), hierarchical form (phrases, sections, and movements governed by logical repetitions, variations, and contrasts), and so on. One way to algorithmically create tonal music is by programming rules for each of these principles. Unfortunately, as has been shown (see David Cope: 2001a), this rules-based approach produces technically correct, but musically stale imitations. Additionally, this approach requires programming a new set of rules for every composer or genre of tonal music under consideration.
Recombinancy, on the other hand, is a method for producing new and logical, i.e. musically logical, collections of musical events (i.e. new compositions) by recombining existing data into new logical orders on the basis of the rules which have been acquired through analysis of specific works or bodies of work (as distinct from the imposition of generic rules).
Recombinancy appears ubiquitously as natural processes as well as a human creative process. As a simple human-creative example, all the great works of literature in the English language result from combination of the twenty-six letters of the alphabet into words, and recombination of those words. Similarly, most of the great works of Western art music consist of combinations of the twelve pitches of the equal-tempered scale, their octave equivalents, and the recombinations of groupings—melodies, harmonies, and so on—that result from these combinations.
As stated previously, the Emmy algorithm assumes that every work, or stylistically consistent body of music, contains an implicit set of instructions, or rules, for creating different but highly-related replications of itself. Consequently, recombinancy, based on rules acquisition (as distinct from rules imposition) provides logical and successful approaches to composing new, highly-related replications of the original work(s).
One of the most important impacts that SPEAC analysis has on algorithmic composition involves the order in which groupings of events are selected and embedded in new compositions in a way which is faithful to the sets of instructions, both formal (i.e. discovered by pattern-matching analysis), and context sensitive (i.e. identified and evaluated by SPEAC analysis), that have been acquired from the database. SPEAC allows for a non-linear approach to recombinant composition. Significant (relatively high SPEAC values) antecedent (A) and statement (S) groupings of a new work are selected first, and the remaining groupings follow in SPEAC priority order. In short, key components (groupings of musical events) of a new work may appear in many places within an overall formal structure initially, simultaneously, rather than appearing first at the beginning of a new work in progress and continuing to be added until the end is reached, in a linear manner. This non-linear process closely resembles how human composers create large-scale works, by envisioning an overall form, or structure, for the work, and then progressively filling in the details.
The non-linear recombination step 140 of the Emmy software algorithm is founded on the principles of Augmented Transition Networks (ATNs) widely employed in computational linguistics and adapted in the Emmy software for processing music instead of language. The implementation of ATNs in the Emmy software is highly complex, in large part because music, unlike spoken language, has few universal rules of syntax (grammatical rules). Short sequences of musical events do not have commonly-agreed meanings (comparable to the dictionary definition of a part-of-speech made up of letters of the alphabet). Nor do longer musical phrases have established meanings in the way that a sentence usually does. By way of illustration, many English speakers could readily explain how the sentence “The dog eats a bone.” is grammatically-correct and also make sense, whereas the sentence “The dog sang a bone” although equally grammatically correct is illogical. But most listeners and even musicians would find it difficult if not impossible to explain why one piece of music might make less musical sense than another structurally similar one. However, music does share with language such qualities as form (poetic literary forms have much in common with musical forms) and style (the characteristics of certain writers can be recognized and to some extent defined). Music also has rules of syntax, but they are nor universal. Rather, they are specific to genres, composers, and in some case even to individual works. Music is not a language, but many languages, and many families of languages with countless idioms and dialects. It is precisely because there is no universal set of rules that can be discovered in all music that the Emmy software focuses on rules acquisition from the musical database in use. This enables the Emmy software to powerfully analyze almost any musical database and recompose in any style, rather than be restricted to a single style by a single set of rules.
The output 150 of the Emmy software is a musical database with an entire new composition, stylistically faithful to compositions in the original database and resembling them, derived from them but not replicating them, resulting from the integration and synthesis of the process steps described above. It can be manifested as a score to be played by musicians, a digital file which can be input into an electromechanical device, such as a MIDI-enabled instrument or sound synthesizer, or readily converted into any other form of musical expression or performance.
While the Emmy software has many advantages in many environments, it is complex and requires considerable memory just to store the executable code. Due to its high-level, AI functions, linguistics-based ATNs and other expert systems, it is also computationally intensive, requiring a very large number of computing cycles to execute the encoded instructions of the algorithm. It is therefore highly unsuited to the rapid, recombinant, re-composition of short musical works, such as would be deployed telephone ringtones, musical toys, videogames, music boxes, and other similar applications requiring rapid and repetitive iterations of new music based on existing bodies of music.