Almost all audio signals consist of a combination of an original dry signal and reverberation. The reverberation results from the dry signal being passed through a reverberant system. For example, consider a singer performing in a concert hall. In this example the singer's voice is the dry signal and the concert hall is the reverberant system. If we place a microphone at some location in the concert hall to record the resulting sound, we will have the dry voice signal with the reverberant characteristics of the concert hall superimposed upon it. That is, the microphone captures a mixture of the direct sound component due to the singer, and the reverberant component due to the sound passing through the concert hall.
Once the original dry signal has the reverberant characteristics of an acoustic space superimposed upon it, it is extremely difficult to recover the original dry signal (or the direct signal component). Similarly, it is extremely difficult to alter the characteristics or level of the reverberant component. The difficulty is due in part to the fact the reverberation is dependent on the original dry signal. That is the reverberation is created from the original dry signal.
Moreover, we do not typically have access to any relevant information regarding the reverberant system. Using the example of the singer in a concert hall, the microphone does not record the acoustic details of the concert hall directly. Rather it records the sound of the singer's voice with the acoustic characteristics of the concert hall superimposed upon it.
In some applications such as musical recordings a certain amount of reverberation is highly desirable since it can provide a subjectively pleasing extension of each note as well as a sense of depth and envelopment. Of course, some acoustic spaces (e.g. concert halls) are more subjectively pleasing than others. However, one does not typically have access to the most subjectively pleasing acoustic spaces and so the reverberant component of the recording may not be as good as one would like. That is the reverberation may not be entirely appropriate for that recording. At present, there is not much that can be done to alter the reverberant component of the recording in this case. If the recording lacks reverberant energy, then one can add more reverberant energy by processing the recording through an artificial reverberation device. However, the reverberation produced by these devices does not tend to sound natural and is unlikely to complement the reverberation that is already present in the recording. Conversely, if the recording has too much reverberation, then there is not much that can be done presently to reduce the level of the reverberant component. If the recording has the right amount of reverberation, but not the right characteristics, then there is not much that can be done presently to alter the characteristics of the reverberation. In each of these cases it would be highly beneficial to be able to modify the direct sound component as well as the level and characteristics of the reverberant energy in order to obtain the appropriate reverberant characteristics.
In other applications even a modest amount of reverberation is not appropriate since it degrades the clarity and intelligibility of the signal. For example, in applications such as teleconferencing where a hands-free telephone is often used, the reverberation of the office or conference room can have the undesirable effect of making the speech signal sound “hollow”. This is often referred to as the rain barrel effect. In other related applications such as security, surveillance and forensics, the reverberation is highly undesirable since it can reduce the intelligibility of speech signals. However in such situations it is typically impossible to have any control over the reverberant characteristics of the acoustic space. In speech recognition systems the reverberation reduces the system's ability to correctly identify words and may thus reduce the recognition rate. If the recognition rate becomes too low then the speech recognition system may be rendered unusable. Reverberation can cause unique difficulties for hearing impaired people since the undesirable effects of the reverberation are often compounded by their hearing impairment. The negative effects of reverberation on speech intelligibility are often more severe for people with hearing impairments. When a hearing aid device amplifies an acoustic signal to make it more audible, it amplifies both the direct sound component and the reverberant component. Therefore, amplifying the signal does not help to overcome the negative effects of the reverberation. In each of these applications it would be highly beneficial to be able to reduce the level of the reverberant component so that it is at an appropriate level with respect to the direct sound component. One common approach to try to reduce the amount of reverberation in an audio signal is to use a directional microphone or a microphone array. The directional microphone and microphone array accept sounds arriving from certain directions and reject sounds coming from other directions. Therefore, if the microphone is placed appropriately then it will accept the desired dry signal while rejecting some portion of the reverberation.
Successful use of a directional microphone or microphone array requires that one knows where the desired signal is located. If the location is not known, or if it is changing over time, then this approach may not work satisfactorily since the desired signal may be rejected. Also, this approach may not be appropriate in certain applications due to the physical size of the microphone array, the increase in the amount of hardware resources required (e.g. microphones, amplifiers, etc), and the resultant increase in cost. Instead, it would be highly beneficial to be able to blindly reduce the level of the reverberant component to an appropriate level using a single non-directional microphone, without any knowledge of the acoustic space, and without any knowledge of the location of the source.
In film and television productions it is important for the sounds that we hear (e.g. dialog and sound effects) to have reverberant characteristics that are appropriate for the image that we see on the screen. For example if the image indicates that the scene is taking place in a small room, then the sound should have the reverberant characteristics of a small room even though it may actually have been recorded on a large sound stage. The term “room tone” is often used in film and television productions to describe the acoustic characteristics of the acoustic space. In general the sounds in film and television productions are often recorded in very different locations. For example parts of the dialog may be recorded at the time of filming, whereas other parts of the dialog may be recorded later in a recording or “dubbing” studio. Here the actors recite their lines while they watch a video of their performance. This process is known as automatic dialog replacement (ADR) and is an extremely common practice. In order for the various parts of the dialog to sound natural and realistic, it is necessary to match the room tone (reverberant characteristics) of the different recordings so that they sound as though they were all recorded in the same acoustic space. Moreover, one usually wants to make the recordings sound like they were recorded in a very specific acoustic space, having a very specific room tone.
In the ADR example the recordings are often very dry since the recording or dubbing studio is usually a carefully controlled acoustic space. That is there is typically very little reverberation in the recordings. In this case one may wish to impose the reverberant characteristics of a specific room onto the recordings. This may be quite difficult if the acoustic characteristics of the room are not directly available. However, other recordings that were recorded in that room may be available. In this case it would be highly useful to be able to extract the acoustic characteristics of an acoustic space from a recording. It would further be useful to be able to impose the reverberant characteristics of the appropriate acoustic space onto a recording.
In situations where different parts of the dialog have been recorded in different acoustic spaces that each have a significant amount of reverberation, then the task is to somehow match the reverberant characteristics of the different recordings. To do this one must first remove the reverberant characteristics of the room in which the recording was done before applying the reverberant characteristics of the appropriate acoustic space. As indicated above, this is a difficult task that has not been satisfactorily resolved to date. In this situation it would be very useful to be able to remove the acoustic characteristics of a recording and then apply the acoustic characteristics of an appropriate acoustic space.
In one class of situations the reverberation found in an audio signal is inappropriate in that it limits one's ability to process the signal in some way. For example in an audio data reduction system the goal is to compress the signal so that a smaller amount of data is used to store or transmit a signal. Such systems use an encoder to compress the signal as well as a decoder to later recover the signal. These audio data reduction systems can be “lossless” in which case no information is lost as a result of the compression process, and so the original signal is perfectly recovered at the decoder. Other versions are “lossy” and so the signal recovered at the decoder is not identical to the original input signal. Audio data reduction systems rely on there being a high degree of redundancy in the audio signal. That is they operate best on audio signals that are “predictable”. However, reverberation in an audio signal reduces its predictability. There are currently no means of overcoming the effects of reverberation in order to improve the performance of an audio data reduction system. It would be highly desirable to be able to decompose a signal into its direct sound component and reverberant component prior to compressing it at the encoder, and then retrieve the reverberant signal after decoding the compressed signal.
Another example where reverberation limits one's ability to process a signal is audio watermarking. In audio watermarking the goal is to hide information inside an audio signal. This hidden information may be used for such things as copyright protection of a song. Audio watermarking systems operate by making small modifications to the audio signal. These modifications must be inaudible if the watermark is to be successful. Here, one would like to make a modification at a very specific point in time in the song. However this modification may become audible if the direct sound component and the reverberant component no longer match each other as a result of the modification. It would be highly desirable to be able to remove the reverberant component of an audio signal, insert an audio watermark, and then add the reverberant component back to the signal.
In another class of situations the reverberation found in a signal becomes inappropriate as a result of some processing. For example it is common to process a signal in order to remove background noise or to alter its dynamic range. This processing often alters the relation between the direct sound component and the reverberant component in the recording such that it is no longer appropriate. There are currently no means of correcting the reverberant component after this processing.
It is often not convenient or impossible to measure the acoustic characteristics of an acoustic space. Using our earlier example, while we can have easy access to a recording of a singer in a concert hall, we very rarely have access to concert hall itself. And, even if we did have access to the concert hall, we wouldn't likely be able to reproduce the acoustic conditions at the time of the recording (e.g. location of the singer and the microphone, presence of an audience, etc.). Therefore we would like to be able to extract a description of the reverberant system from a recording (or real-time signal) that was made within that reverberant system. Most importantly we would like to be able to extract a description of the perceptually relevant aspects of the reverberant system. To date, there is no method that adequately satisfies this need. This description of the reverberant system may be used to analyze the reverberant system, as part of a system for modifying or reducing the reverberant characteristics in a recording, or as part of a system for imposing reverberant characteristics onto a recording.
The earliest audio recordings (film, music, television, etc.) were monophonic. That is they were recorded onto only one channel. Stereo audio recordings are typically more pleasing since they are better at reproducing the spatial aspects of the reverberant characteristics of the acoustic space. Numerous processes have been developed to try to convert monophonic recordings to a stereophonic format. These techniques are limited by the fact that they process both the direct sound component as well as the reverberant component. These techniques could be improved dramatically if they could process the direct sound component and reverberant component separately. At present, there is no satisfactory way to decompose the signal into a direct sound component and reverberant component so that they may be processed separately.
Multichannel surround systems are becoming increasingly popular. Whereas a stereo system has two channels (and thus two loudspeakers) a multichannel surround system has multiple channels. Typical multichannel surround systems use five channels and hence five loudspeakers. At present the number of multichannel audio recordings available is quite limited. Conversely, there are a very large number of mono and stereo recordings available. It would be highly desirable to be able to take a mono or stereo audio signal and produce a multichannel audio signal from it. Current methods for doing this use an approach called “matrix decoding”. These methods will take a stereo recording and place different parts of the recording in each of the channels of the multichannel system. In the case of music recordings, some of the instruments will appear to be located behind the listener. This is not a desirable result in some situations. For example when playing an orchestral recording one does not typically want some of the instruments to appear to be located behind the listener. Rather, one typically wants the instruments to appear to be located in front of the listener, with the concert hall reverberation appearing to arrive from all around the listener.
One way to approach this problem is to send the original stereo signal to the front loudspeakers while also processing the stereo signal through an artificial reverberation device. The outputs of the artificial reverberation device are intended to provide a simulation of the concert hall reverberation, and they would be sent to the rear (surround) loudspeakers. This approach is not satisfactory for several reasons. First, the approach adds additional reverberation on top of the reverberation already present in the stereo signal. Therefore, this approach can make the overall amount of reverberation inappropriate for that particular recording. Moreover, the reverberation added by the artificial reverberation device is not likely to match the characteristics of the reverberation in the stereo recording. This will make the resultant multichannel signal sound unnatural. A better approach would be to decompose the stereo signal into its direct sound component and its reverberant component.
With the original signal decomposed into direct and reverberant components, one could choose to create multichannel audio signals by processing the direct sound component through a multichannel artificial reverberation device. This method would avoid the problem of adding additional reverberation since the reverberant component of the signal has been removed. This method would also avoid the problem of a mismatch between the artificial reverberation and the reverberation in the original recording.
Alternatively, with the original signal decomposed into direct and reverberant components, one could choose to create multichannel audio signals by sending the direct component to the front loudspeakers. This would preserve the frontal placement of the instruments in the reproduced sound field. The reverberant component of the original signal could either be sent to the rear loudspeakers, or it could decomposed into sub-components and distributed across all of the loudspeakers in an appropriate manner. This approach would have the significant advantage of creating a multichannel signal entirely from the components of the original recording, thus creating a more natural sounding result. There are no methods currently available that allow a signal to be decomposed into direct and reverberant components so that multichannel signals can be generated in this manner.
In general, if one had a recording of a sound in a reverberant system and one could somehow directly measure the acoustic characteristics of that reverberant system, then it would be possible to mathematically invert the reverberant system and completely recover the original dry sound. This process is known as inverse filtering. However inverse filtering cannot be done without precise measurements of the exact acoustic characteristics of the reverberant system. Moreover, the resulting inverse filter is specific to that one set of acoustic characteristics. It is not possible to use inverse filtering to recover the original dry signal from a recording in a given reverberant system using the acoustic characteristics measured from a different reverberant system. For example, an inverse filter derived for one location in a room is not valid for any other location in the same room. Other problems with inverse filters are that they can be computationally demanding and they can impose a significant delay onto the resulting signal. This delay may not be acceptable in many real-time applications. Therefore, we would like to have a means of achieving the benefits of inverse filtering while overcoming the limitations that make it impractical in most real-world applications. There are presently no means available to adequately perform this task.
As described above there are numerous situations where the reverberation found in an audio signal is not appropriate for its intended final application. Therefore, there is a need to be able to modify the direct sound component and/or the reverberant sound component of the audio signal. Furthermore we would like to be able to modify this reverberation without having to directly measure the acoustic space in which it was recorded. These problems have not been satisfactorily solved to date.