The present invention relates generally to digital signal processing techniques for processing an audio signal and more particularly to digital signal processing techniques for identifying the onset of a sonic event within an audio signal.
A common task in the production of a multimedia program involves the editing of the audio signal for the program. Typically, the audio signal is edited to enhance or augment the originally recorded audio. This involves either mixing other audio with the original audio or totally replacing a portion of the audio with new audio. In either case it is necessary to precisely identify the start of an audio segment that is to be edited so that the modified audio will seamlessly fit in with the rest of the audio. Frequently, the point of editing is associated with a particular sonic event such as a percussive hit or other distinctive, loud sound, and thus it becomes necessary to identify these events.
Because of the precision required to locate the onset of a sonic event such as a percussive hit, digital signal processing methods have been implemented on computer systems to detect these events in an automated fashion. Conventionally, an analog audio signal, representing the volume of the audio, is sampled by an Analog-to-Digital (A/D) converter to produce a digital representation of the signal. The sonic event is then identified by comparing the resulting digital values against a threshold value that corresponds to the particular sonic event of interest. If the digital value of the audio exceeds the predetermined threshold value, the sonic event is said to have occurred. While this approach is useful in deciding when the volume of the audio rises above a predetermined level, it has the disadvantage that a sonic event will be triggered for as long as the volume exceeds the threshold value. In other words, if the volume remains above the threshold level for a significant period of time, multiple sonic events are triggered. To avoid this consequence, the detection analysis is typically xe2x80x9cturned offxe2x80x9d for a fixed interval of time after the initial detection of the sonic event. While disabling the detector for a set time interval may eliminate multiple triggering, it also has the disadvantage that a legitimate sonic event can not be detected during this interval. Thus, information about the audio signal may be lost during the time the detection process is xe2x80x9cturned offxe2x80x9d, and the editing of the audio is necessarily restricted due to the failure of the system to detect the event. Furthermore, false triggers may be generated if the volume continues above the threshold value when detection is resumed after the fixed time interval has expired.
Thus it is desirable to provide for an automated system and method for recognizing the onset of a sonic event that is characterized by a rapid increase in volume without requiring that the detection process be disabled to avoid false triggering of sonic events.
The present invention provides for a method and system for identifying a sonic event of interest within a received audio signal. A sonic event is characterized by a predetermined rate of change in the perceived audio volume, and is associated with the loudness of the audio.
In one aspect of the invention, examples of a sonic event include percussive hits such as those emanating from drums, cymbals or a piano.
In a further aspect of the invention, a first digital signal corresponding to a filtered digital representation of the audio signal is generated, and a second digital signal representative of the rate of change of the first digital signal is derived from the filtered representation. A sonic event is said to occur when the second digital signal exceeds a predetermined level.
In another aspect of the invention, the digital representation of the audio is high-pass filtered to remove inaudible low frequencies. In one practice of the invention the high-pass filter has a pass band above 20 Hz.
In a yet further aspect of the invention, the energy of the high-pass filtered digital signal is derived and then filtered with a low-pass filter to remove audible frequencies. The low-pass filter advantageously has only real poles to avoid oscillatory transients resulting from the filter xe2x80x9cringingxe2x80x9d.
In a still further aspect of the invention, a digital signal representative of the perceived volume of the original audio is generated from the low-pass filtered energy signal, differentiated and scaled appropriately to derive a digital signal indicative of a change in the volume that can be compared with a predetermined threshold value for determining the onset of the sonic event of interest.
The method and system according to the present invention advantageously provides for detection of a sonic event such as a percussive hit without requiring that the detector be disabled for a fixed time to avoid false triggering. Furthermore, because the detector is not disabled during the detection process, sonic events occurring in close proximity are easily recognized and not ignored as in some conventional systems.