The present invention relates to the field of digital representation of music and to techniques for allowing a user to enter a selection of a realization of the music.
Most of today""s audio data, at the professional as well as at the consumer level, is distributed and stored in digital format. This has greatly improved the general handling of recorded audio material, such as transmission of audio files and modification of audio files.
Techniques for navigating among audio data files have been developed. For example a track number and time is used as a navigation means for compact discs (CDs). A variety of more sophisticated techniques for navigating among the program segments and to otherwise process audio files is known from the prior art:
U.S. Pat. No. 6,199,076 shows an audio program player including a dynamic program selection controller. This includes a playback unit at the subscriber location to reproduce the program segments received from a host and a mechanism for interactively navigating among the program segments.
U.S. Pat. No. 5,393,926, is a virtual music system. There is included a multi-element actuator that generates a plurality of signals in response to being played by a user. The system also has an audio synthesizer that generates audio tones in response to control signals. There is a memory storing a musical score for the multi-element actuator, the stored musical score including a sequence of lead notes and an associated sequence of harmony note arrays. Each harmony note array of the sequence corresponding to a different one of the lead notes and contain zero, one or more harmony notes. The instrument also includes a digital processor receiving the plurality of signals from the multi-element actuator and generating a first set of control signals therefrom. The digital processor is programmed to identify from among the sequence of lead notes in the stored musical score a lead note which corresponds to a first one of the plurality of signals. The digital processor is also programmed to map a set of the remainder of the plurality of signals to whatever harmony notes are associated with the selected lead note, if any. Moreover, the digital processor is programmed to produce the first set of control signals from the identified lead note and the harmony notes to which the signals of the plurality of signals are mapped. The first set of control signals causes the synthesizer to generate sounds representing the identified lead note and the mapped harmony notes.
U.S. Pat. No. 5,390,138, is a system for connecting an audio object to various multimedia objects to enable an object-oriented simulation of a multimedia presentation using a computer with a storage and a display. A plurality of multimedia objects are created on the display including at least one connection object and at least one audio object. Multimedia objects are displayed, including at least one audio object. The multimedia object and the audio object create a multimedia presentation.
U.S. Pat. No. 5,388,264, is a system for connecting a Musical Instrument Digital Interface (MIDI) object to various multimedia objects to enable an object-oriented simulation of a multimedia presentation using a computer with a storage and a display. A plurality of multimedia objects are created on the display including at least one connection object and at least one MIDI object in the storage. The multimedia object and the MIDI object are connected, and information is routed there between to create a multimedia presentation.
U.S. Pat. No. 5,317,732 is a process performed in a data processing system that includes receiving an input selecting one of a plurality of multimedia presentations to be relocated from a first memory to a second memory, scanning the linked data structures of the selected multimedia presentation to recognize a plurality of resources corresponding to the selected multimedia presentation, and generating a list of names and locations within the selected multimedia presentation corresponding to the identified plurality of resources. The process also includes renaming the names on the generated list, changing the names of the identified plurality of resources in the selected multimedia presentation to the new names on the generated list, and moving the selected multimedia presentation and the resources identified on the generated list to the second memory.
U.S. Pat. No. 5,262,940 is a portable audio/audio-visual media tracking device.
U.S. Pat. No. 5,247,126, is an image reproducing apparatus, image information recording medium, and musical accompaniment playing apparatus.
U.S. Pat. No. 5,208,421, is a method and apparatus for audio editing of MIDI files. The invention may be utilized to ensure the integrity of a source MIDI file, a copied or lifted section or a target file by automatically inserting matching note on or note off messages into a file or file section to correct inconsistencies created by such editing. Additionally, program status messages are automatically inserted into source files, copied or lifted sections, or target files to yield results that are consistent with the results that may be obtained by editing digital audio data. Timing information is selectively added or maintained such that MIDI files may be selectively edited without requiring a user to learn a complex MIDI sequencer.
U.S. Pat. No. 5,153,829, is an information processing apparatus. The invention has a unit for displaying on a screen a musical score, keyboard, and tone time information to be inputted. There is also a unit for designating the position of the keyboard, and tone time information, respectively displayed on the display unit. Moreover, the invention includes a unit for storing musical information produced through designation by the designating unit of the position of the keyboard and tone time information displayed on the display unit. Additionally, there is a unit for controlling the display of the musical score, keyboard, and tone time information on the screen of the display unit. The unit also is for controlling the display of a pattern of musical tone or rest on the musical score on the display unit in accordance with the position of the keyboard and tome time information respectively designated by the designating unit. Finally, there is a unit for generating a musical tone by reading the musical information stored in the storage unit.
U.S. Pat. No. 5,142,961, is a method for storage, transcription, manipulation and reproduction of music on system-controlled musical instruments which faithfully reproduces the characteristics of acoustic musical instruments. The system comprises a music source, a central processing unit (CPU) and a CPU-controlled plurality of instrument transducers in the form of any number of acoustic or acoustic hybrid instruments. In one embodiment, performance information is sent from a music source MIDI controller to the CPU, edited in the CPU, converted into an electrical signal, and sent to instrument transducers via transducer drivers. In another embodiment, individual performances stored in a digital or sound tape medium are reproduced at will through the instrument transducers, or converted into MIDI data by a pitch/frequency detection device for storage, editing or performance in the CPU. In still another embodiment, performance information is extracted from an electronic recording medium or live performance by a pitch/frequency detection device, edited in the CPU, converted into an electrical signal, and sent to any number of instrument transducers. The device also eliminates typical acoustic musical instrument delay problems.
U.S. Pat. No. 5,083,491, is a method and apparatus for re-creating expression effects on solenoid actuated music producing instruments contained in musical renditions recorded in MIDI format for reproduction on solenoid actuated player piano systems. Detected strike velocity information contained in the MIDI recording is decoded and correlated to strike maps stored in a controlling microprocessor. The strike maps contain data corresponding to desired musical expression effects. Time differentiated pulses of fixed width and amplitude are directed to the actuating solenoids in accordance with the data in the strike maps, and the actuating solenoids in turn strike the piano strings. Thereafter, pulses of uniform amplitude and frequency are directed to the actuating solenoids to sustain the strike until the end of the musical note. The strike maps dynamically control the position of the solenoid during the entire duration of the strike to compensate for non-linear characteristics of solenoid operation and piano key movement, thus providing true reproduction of the original musical performance.
U.S. Pat. No. 5,046,004 is a system using a computer and keyboard for reproducing music and displaying words to the music. Data for reproducing music and displaying words are composed of binary-coded digital signals. Such signals are downloaded via a public communication line, or data corresponding to a plurality of musical pieces or songs are previously stored in an apparatus, and the stored data are selectively processed by a central processing unit of a computer. In the instrumental music data, trigger signals are existent for progression of processing the words data, whereby the reproduction of music and the display of words are linked to each other. The music thus reproduced is utilized as background music or for enabling the user to sing to the accompaniment thereof while watching the words displayed synchronously with such music reproduction.
U.S. Pat. No. 4,744,281, is an automatic music player system having an ensemble playback mode of operation using a memory disk having recorded thereon a piece of music composed of at least two combined parts to be reproduced separately of each other. The parts being recorded in the form of at least two data subblocks, comprising a first sound generator to mechanically generate sounds when mechanically or electrically actuated, at least one second sound generator to electronically generate sounds when electronically actuated and a control unit connected to the first and second sound generators. One of the two or more subblocks of the data read from the disk is discriminated from another, whereupon the discriminated one of the data subblocks is transmitted to the first sound generator and another data subblock transmitted to the second sound generator. Additionally, the transmission of data to the second sound generator is continuously delayed by a predetermined period of time from the transmission of data to the first sound generator so that the two sound generators are enabled to produce sounds concurrently and in concert with each other.
It is a common disadvantage of the prior art that navigating among audio data is cumbersome and seriously lacks precision.
Accordingly it is an aspect of the present invention to provide an improved method of generating a link between a note of a digital score and a realization of the score as well as a corresponding computer program product. Further the invention provides an electronic audio device with improved navigation capabilities.
The invention enables to create a link between a representation of a piece of music and a recorded realization of the music. This allows to select a note of a digital score in order to automatically begin a playback of the realization starting with the selected note.
In accordance with a preferred embodiment of the invention the digital score is visualized on a computer monitor. By means of a graphical user interface a user can select a note of the digital score. For example, this can be done by xe2x80x9cclickingxe2x80x9d on a note by means of a computer mouse. This way a link which is associated with the note is selected. The link points to a location of a recorded realization of the music which corresponds to the user selected note. Further a signal is generated automatically by selecting the note which starts playback of the realization at the location indicated by the link which is associated with the selected note.
In accordance with a further preferred embodiment of the invention the digital score is analyzed to determine significant audio events in the music. This is done by selecting a time unit that allows to express all notes of the score as integer multiples of this time unit. This way the time axis is divided into logical time intervals.
The number of onsets of the score in each of the time intervals is determined. This results in the number of onsets over time. This onset curve is filtered. One way of filtering the onset curve is to apply a threshold to the onset curve. This means that the accumulated onsets of time intervals which do not surpass the predefined threshold are removed from the onset curve. This way insignificant audio events are filtered out.
The filtered onset curve determines a series of time intervals with accumulated onsets above the threshold. This series of time intervals is to be aligned with a corresponding series of time intervals being representative of the same audio events in the recorded realization of the music.
In accordance with a preferred embodiment of the invention the series of time intervals for the recorded realization is determined by comparing the intensity of the realization with a threshold. When the intensity drops below the threshold the corresponding time interval is selected for the series of time intervals.
In accordance with a further preferred embodiment of the invention the mapping of the series of time intervals of the representation and of the realization are mapped by means of minimizing a Hausdorff distance between the two series.
Felix Hausdorff (1868-1942) devised a metric function between subsets of a metric space. By definition, two sets are within Hausdorff distance d from each other if any point of one set is within distance d from some point of the other set.
Given two sets of points A={a1, . . . , am} and B=(b1, . . . , bn): the Hausdorff distance is defined as
H(A, B)=max(h(A, B), h(B, A))xe2x80x83xe2x80x83(1) 
where                               h          ⁡                      (                          A              ,              B                        )                          =                              max                          a              ∈              A                                ⁢                                    min                              b                ∈                B                                      ⁢                                          "LeftDoubleBracketingBar"                                  a                  -                  b                                "RightDoubleBracketingBar"                            .                                                          (        2        )            
The function h(A, B) is called the directed Hausdorff xe2x80x98distancexe2x80x99 from A to B (this function is not symmetric and thus is not a true distance). It identifies the point axcex5A that is farthest from any point of B, and measures the distance from a to its nearest neighbor in B. Thus the Hausdorff distance, H(A, B), measures the degree of mismatch between two sets, as it reflects the distance of the point of A that is farthest from any point of B and vice versa. Intuitively, if the Hausdorff distance is d, then every point of A must be within a distance d of some point of B and vice versa.
The two series of time intervals provided by the analysis of the score and the analysis of the realization are shifted with respect to each other until the Hausdorff distance between the two sets of time intervals reaches a minimum. This way pairs of time intervals of the two series are determined. Hence, for each pair a note belonging to a specific time interval is mapped onto a point of time of a realization and a link is formed between the note and the corresponding location of the recording of the realization.
An alternative way to perform the mapping operation is to shift the two series of time intervals with respect to each other until a cross correlation function reaches a maximum value. Other mathematical methods for finding a best matching position between the two series can be utilized.