In movies or on television, dialog and narrative are often presented together with other, non-speech, sounds such as music, jingles, effects, and ambiance. In many cases the speech sounds and the non-speech sounds are recorded separately and mixed under the control of a sound engineer. When speech and non-speech sounds are mixed, the non-speech sounds may partially mask the speech, thereby rendering a fraction of the speech inaudible. As a result, listeners must comprehend the speech based on the remaining, partial information. A small amount of masking is easily tolerated by young listeners with healthy ears. However, as masking increases, comprehension becomes progressively more difficult until the speech eventually becomes unintelligible (see e.g., ANSI S3.5 1997 “Methods for Calculation of the Speech Intelligibility Index”). The sound engineer is intuitively aware of this relationship and mixes speech and background at relative levels that usually provide adequate intelligibility for the majority of viewers.
While background sounds hinder intelligibility for all viewers, the detrimental effect of background sounds is larger for seniors and persons with hearing impairment (c.f., Killion, M. 2002. “New thinking on hearing in noise: A generalized Articulation Index” in Seminars in Hearing, Volume 23, Number 1, pages 57 to 75, Thieme Medical Publishers, New York, N.Y.). The sound engineer, who typically has normal hearing and is younger than at least part of his audience, selects the ratio of speech to non-speech audio based on his own internal standards. Sometimes that leaves a significant portion of the audience straining to follow the dialog or narrative.
One solution known in the prior art exploits the fact that speech and non-speech audio exist separately at some point in the production chain in order to provide the viewer with two separate audio streams. One stream carries primary content audio (mainly speech) and the other carries secondary content audio (the remaining audio program, which excludes speech). The user is given control over the mixing process. Unfortunately, this scheme is impractical because it does not build on the current practice of transmitting a fully mixed audio program. Rather, it replaces the main audio program with two audio streams that are not in use today. A further disadvantage of the approach is that it requires approximately twice the bandwidth of current broadcast practice because two independent audio streams, each of broadcast quality, must be delivered to the user.
The successful audio coding standard AC-3 allows simultaneous delivery of a main audio program and other, associated audio streams. All streams are of broadcast quality. One of these associated audio streams is intended for the hearing impaired. According to the “Dolby Digital Professional Encoding Guidelines,” section 5.4.4, available at http://www.dolby.com/assets/pdf/tech_library/46_DDEncodingGuidelines.pdf, this audio stream typically contains only dialog and is added, at a fixed ratio, to the center channel of the main audio program (or to the left and right channels if the main audio is two-channel stereo), which already contains a copy of that dialog. See also ATSC Standard: Digital Television Standard (A/53), revision D, Including Amendment No. 1, Section 6.5 Hearing Impaired (HI). Further details of AC-3 may be found in the AC-3 citations below under the heading “Incorporation by Reference.”
It is clear from the preceding discussion that at present there is a need for, but no way of increasing the ratio of speech to non-speech audio in a manner that exploits the fact that speech and non-speech audio are recorded separately while building on the current practice of transmitting a fully mixed audio program and also requiring minimal additional bandwidth. Therefore, it is the object of the present invention to provide a method for optionally increasing the ratio of speech to non-speech audio in a television broadcast that requires only a small amount of additional bandwidth, exploits the fact that speech and non-speech audio are recorded separately, and is an extension rather than a replacement of existing broadcast practice.