Time and Pitch are fundamental components of music. Rhythm is concerned with the relative duration of pitch and silence events in time. In fact, the quality of a music performance is largely judged by how well a performer or group of performers keep the time. In music compositions, time is divided into intervals that the musician follows when playing music notes. The closer the onset of the notes to the beginning of a time interval, or to a subdivision thereof, the more agreeable the music sounds to the human ear. In order to learn to keep time, musicians use a time keeping device, such as a metronome while playing music. With practice, skilled performers are able to play notes in relative timing with each metronome tick. However, in other cases the performer may keep an average time over the length of a performance, whereas the notes may individually deviate from each expected ideal tick, this is known as rubato. The human ear is sensitive to even small deviations in time and is able to judge the quality of the performance due to these deviations.
Modern digital data processing applications offer tools to correct or enhance audio data. These applications are capable of reducing background noise, enhancing stereo effects, adding or removing echo effects or performing other such enhancements to the audio data. However, these existing applications do not provide a mechanism for correcting inaccurate rhythm events in the audio data. Because of this and other limitations inherent in the prior art, there is a need for a process that can reduce rhythmic deviations in audio data.
Embodiments of the invention provide a mechanism for enhancing the rhythm of an audio data stream or audio stream for short. For instance, systems adapted to implement the invention are capable of enhancing rhythm in audio data by obtaining the underlying rhythm information, determining for each audio data event an ideal time, and correcting significant deviations from the ideal time.
Audio data waveforms generally show periods of relatively low amplitude and periods of high amplitude. Transient events occur between relatively low amplitude and high amplitude audio waveform portions of the audio data and generally correspond to beats in the music that are expected to occur at regular intervals. The relation of these events in time has a significant impact upon the quality of the performance. Embodiments of the invention detect deviations from an ideal time for each event and alter the timing of each transient event to achieve this ideal timing.
Embodiments of the invention may utilize a conversion function to represent the energy in audio signal. From an audio energy viewpoint, transients are regions where the energy abruptly increases. By detecting local increases of energy, an embodiment of the invention is able to detect each transient and determine a number of timing parameters for each transient. For example, the system may determine the time at which a transient reaches a given threshold level, the time the transient reaches a local peak, the time of the onset of the transient, and any other time related information that may be garnered from the audio signal.
Embodiments of the invention compare one or more time references for each transient with time data of an ideal time event (that may for example correspond with a time tick of a metronome) and compute a deviation between the occurrence of the transient and its expected ideal time. A determination as to whether to correct the deviation may then be made based on one or more correction criteria.
The system may apply one or more techniques for correcting time deviations. In one embodiment of the invention, when the transient is to be moved to an earlier point in time, the system may compress one or more portions of the audio data ahead of the transient. In the case when a transient is to be delayed, the system may expand audio data ahead of the transient in question.
Expansion and compression by inserting and deleting audio data may lead to unpleasant sound effects which are known as artifacts. Embodiments of the invention employ methods for manipulating the audio data either by introducing no artifacts or by applying further methods to remove the artifacts. To this end, embodiments of the invention may utilize cross-fading methods to correct for transitions between segments after a portion of the audio data has been removed, which may have created discontinuities in the signal. In other cases where a portion of the audio data is to be expanded, an embodiment of the invention may utilize cross-fading among a number of successive segments to achieve expansion without introducing a repetitive pattern that may be detected by the human ear and judged unpleasant.
By obtaining a preferred rhythm for a performance, detecting an ideal time for each transient and correcting significant deviations from the ideal time, embodiments of the invention provide a powerful tool to enhance music quality as perceived by the human ear.