The invention relates to audio signal processing in general and to improving clarity of dialog and narrative in surround entertainment audio in particular.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Modern entertainment audio with multiple, simultaneous channels of audio (surround sound) provides audiences with immersive, realistic sound environments of immense entertainment value. In such environments many sound elements such as dialog, music, and effects are presented simultaneously and compete for the listener's attention. For some members of the audience—especially those with diminished auditory sensory abilities or slowed cognitive processing—dialog and narrative may be hard to understand during parts of the program where loud competing sound elements are present. During those passages these listeners would benefit if the level of the competing sounds were lowered.
The recognition that music and effects can overpower dialog is not new and several methods to remedy the situation have been suggested. However, as will be outlined next, the suggested methods are either incompatible with current broadcast practice, exert an unnecessarily high toll on the overall entertainment experience, or do both.
It is a commonly adhered-to convention in the production of surround audio for film and television to place the majority of dialog and narrative into only one channel (the center channel, also referred to as the speech channel). Music, ambiance sounds, and sound effects are typically mixed into both the speech channel and all remaining channels (e.g., Left [L], Right [R], Left Surround [ls] and Right Surround [rs], also referred to as the non-speech channels). As a result, the speech channel carries the majority of speech and a significant amount of the non-speech audio contained in the audio program, whereas the non-speech channels carry predominantly non-speech audio, but may also carry a small amount of speech. One simple approach to aiding the perception of dialog and narrative in these conventional mixes is to permanently reduce the level of all non-speech channels relative to the level of the speech channel, for example by 6 dB. This approach is simple and effective and is practiced today (e.g., SRS [Sound Retrieval System] Dialog Clarity or modified downmix equations in surround decoders). However, it suffers from at least one drawback: the constant attenuation of the non-speech channels may lower the level of quiet ambiance sounds that do not interfere with speech reception to the point where they can no longer be heard. By attenuating non-interfering ambiance sounds the aesthetic balance of the program is altered without any attendant benefit for speech understanding.
An alternative solution is described in a series of patents (U.S. Pat. No. 7,266,501, U.S. Pat. No. 6,772,127, U.S. Pat. No. 6,912,501, and U.S. Pat. No. 6,650,755) by Vaudrey and Saunders. As understood, their approach involves modifying the content production and distribution. According to that arrangement, the consumer receives two separate audio signals. The first of these signals comprises the “Primary Content” audio. In many cases this signal will be dominated by speech but, if the content producer desires, may contain other signal types as well. The second signal comprises the “Secondary Content” audio, which is composed of all the remaining sounds elements. The user is given control over the relative levels of these two signals, either by manually adjusting the level of each signal or by automatically maintaining a user-selected power ratio. Although this arrangement can limit the unnecessary attenuation of non-interfering ambiance sounds, its widespread deployment is hindered by its incompatibility with established production and distribution methods.
Another example of a method to manage the relative levels of speech and non-speech audio has been proposed by Bennett in U.S. Application Publication No. 20070027682.
All the examples of the background art share the limitation of not providing any means for minimizing the effect the dialog enhancement has on the listening experience intended by the content creator, among other deficiencies. It is therefore the object of the present invention to provide a means of limiting the level of non-speech audio channels in a conventionally mixed multi-channel entertainment program so that speech remains comprehensible while also maintaining the audibility of the non-speech audio components.
Thus, there is a need for improved ways of maintaining speech audibility. The present invention solves these and other problems by providing an apparatus and method of improving speech audibility in a multi-channel audio signal.