Personal video recorders (PVRs), also known as digital video recorders (DVRs), such as TiVO and ReplayTV devices, are popular nowadays for their enhanced capacities in recording television programming. They may offer such functions as “one-touch programming” for automatically recording every episode of a show for an entire season, “commercial advance” for automatically skipping through commercials while watching a recorded broadcast, an “on-screen guide” for looking up recorded programs to view, etc. The PVRs may also suggest programs for recording based on a user's viewing habit. These devices also enable the “pausing”, “rewinding” and “fast-forwarding” of a live television (“TV”) broadcast while it is being recorded. However, PVRs typically use electronic program guides (EPGs) to facilitate the selection of programming content for recording. In instances where the actual broadcast start or end time of a program is different than the EPG start or end time, programming content is often recorded that the user did not want, or all of the programming content that the user intended to record is not actually recorded. The program guide data stream is typically provided by a third party that aggregates program scheduling information from a plurality of sources of programming.
The actual start and end times for a given broadcast program may be different than the EPG start and end times for various reasons. For example, overtime in a sports event may cause the event to go beyond the scheduled end time. Presidential news conferences, special news bulletins and awards ceremonies often have indeterminate endings, as well. Technical difficulties causing the content provider to broadcast a program at a time other than that which is scheduled may also cause a variance in the start and/or end time of a program. In addition, when the time of one program provided on a specific channel is off schedule, subsequent programs provided by the channel may also be unexpectedly affected, interfering with the ability to record the subsequent program. To avoid offsetting the start and end times of subsequent programs, scheduled programming content may be manipulated (for example, a certain program or commercial segment may be skipped and therefore not broadcast), which may prevent programming of the skipped program. This makes recording programs for later viewing difficult.
Video on demand (“VOD”), movie on demand (“MOD”) and network PVR services, which may be subscription services, address at least some of these disadvantages by storing broadcasted programs for later retrieval by customers. Movies and TV programs (referred to collectively as “programs”) may be acquired and stored in real time, from multiple origination points. Typically, entire program streams for each broadcast channel are stored each day. When a customer requests a particular program that has already been broadcast and stored, the system may fetch the content of the requested program from storage in a network based on the program channel and time in an EPG, and transmit the program to the customer. An example of a network PVR system is described in copending, commonly assigned application Ser. No. 10/263,015, filed on Oct. 2, 2002.
With the advent of digital communications technology, many TV broadcast streams are transmitted in digital formats. For example, Digital Satellite System (DSS), Digital Broadcast Services (DBS), and Advanced Television Standards Committee (ATSC) broadcast streams are digitally formatted pursuant to the well known Moving Pictures Experts Group 2 (MPEG-2) standard. The MPEG-2 standard specifies, among others, the methodologies for video and audio data compressions which allow multiple programs, with different video and audio feeds, multiplexed in a transport stream traversing a single broadcast channel. Newer systems typically use the Dolby Digital AC-3 standard to encode the audio part of the transport stream, instead of MPEG-2. The Dolby Digital AC-3 standard was developed by Dolby Digital Laboratories, Inc., San Francisco, Calif. (“Dolby”). A digital TV receiver may be used to decode the encoded transport stream and extract the desired program therefrom. The prior art PVRs take advantage of compression of video and audio data to maximize the use of their limited storage capacity, while decreasing costs.
In accordance with the MPEG-2 standard, video data is compressed based on a sequence of groups of pictures (“GOPs”), in which each GOP typically begins with an intra-coded picture frame (also known as an “I-frame”), which is obtained by spatially compressing a complete picture using discrete cosine transform (DCT). As a result, if an error or a channel switch occurs, it is possible to resume correct decoding at the next I-frame.
The GOP may represent up to 15 additional frames by providing a much smaller block of digital data that indicates how small portions of the I-frame, referred to as macroblocks, move over time. Thus, MPEG-2 achieves its compression by assuming that only small portions of an image change over time, making the representation of these additional frames extremely compact. Although GOPs have no relationship between themselves, the frames within a GOP have a specific relationship which builds off the initial I-frame.
The compressed video and audio data are carried by respective continuous elementary streams. The video and audio streams are multiplexed. Each stream is broken into packets, resulting in packetized elementary streams (PESs). These packets are identified by headers that contain time stamps for synchronization, and are used to form MPEG-2 transport streams. For digital broadcasting, multiple programs and their associated PESs are multiplexed into a single transport stream. A transport stream has PES packets further subdivided into short fixed-size data packets, in which multiple programs encoded with different clocks can be carried. A transport stream not only comprises a multiplex of audio and video PESs, but also other data such as MPEG-2 program specific information (“PSI”) describing the transport stream. The MPEG-2 PSI includes a program associated table (“PAT”) that lists every program in the transport stream. Each entry in the PAT points to a program map table (PMT) that lists the elementary streams making up each program. Some programs are open, but some programs may be subject to conditional access (encryption) and this information is also carried in the MPEG-2 PSI.
The aforementioned fixed-size data packets in a transport stream each carry a packet identifier (“PID”) code. Packets in the same elementary streams all have the same PID, so that a decoder can select the elementary stream(s) it needs and reject the remainder. Packet-continuity counts are implemented to ensure that every packet that is needed to decode a stream is received.
The Dolby Digital AC-3 format, mentioned above, is described in the Digital Audio Compression Standard (AC-3), issued by the United States Advanced Television Systems Committee (“ATSC”) (Dec. 20, 1995), for example, which is incorporated by reference, herein. The AC-3 digital compression algorithm encodes pulse code modulation (“PCM”) samples of 1 to 5.1 channels of source audio into a serial bit stream at data rates from 32 kbps to 640 kbps. (The 0.1 channel refers to a fractional bandwidth channel for conveying only low frequency (subwoofer sounds)). An AC-3 encoder at a source of programming produces an encoded bit stream, which is decoded by an AC-3 decoder at a receiver. The receiver may be at a distributor of programming, such as a cable system, or at a set-top box at the consumer's location, for example. The encoded bit stream is a serial stream comprising a sequence of synchronization frames. Each frame contains 6 coded audio blocks, each representing 256 new audio samples. A synchronization frame header is provided containing information required to synchronize and decode the signal stream. A bit stream header follows the synchronization stream header, describing the coded audio service. An auxiliary data field may be provided after the audio blocks. An error check field may be provided, as well. The AC-3 encoded audio stream is typically multiplexed with the MPEG-2 program stream.
Program audio is provided to a distributor; such as a cable system, by a source with a set level of loudness. The audio is typically broadcast by the distributor at the set loudness. Viewers adjust the loudness level to meet their own subjective, desired level by adjusting the volume control on their TV Viewers typically watch programming provided by different sources and there is no currently accepted standard fox setting loudness of audio provided with programs. Each source typically sets a loudness level in accordance with their own practices. For example, a cable system broadcasts a program comprising content by one source and advertising provided by one or more other sources As viewers change channels, they may also view programs from different sources. In VOD, MOD and network PVR systems, programs viewed on the same channel may also have been provided to the systems by different sources. Ideally, once a viewer sets the volume control of their TV to a desired volume, it would not be necessary to adjust the volume control. Often, however, there are sudden loudness changes as a program transitions to and from advertising with different loudness settings or from one program to another program with a different loudness setting, requiring the viewer to adjust the volume. This can be annoying.
Loudness is a subjective perception, making it difficult to measure and quantify. The most commonly used devices to measure loudness are Voltage Unit (“VU”) meters and Peak Program Meters (“PPM”), which measure voltages of audio signals. These devices do not take into consideration the sensitivities and hearing patterns of the human ear, however, and listeners may still complain about loudness of audio at apparently acceptable voltages.
In an attempt to quantify loudness as it is perceived by a listener, CBS Laboratories developed a loudness meter in the 1960's that divided audio signals into seven (7) bands, weighted the gain of each band to match the equal loudness curve of the human ear, averaged each band with a given time constant, summed the averages, and averaged the total again with a time constant about 13 times longer than the first time constant. A few broadcast audio processor manufacturers currently use an algorithm based on the CBS Loudness Meter to detect audio that could sound too loud to a listener Gain reduction is applied to reduce the loudness. (Audio Notes: Tim Carroll, Exploring the AC-3 Audio Standard for AISC (retrieved from TV Technology dot com, www dot tvtechnology dot com, Jun. 26, 2002)).
Equivalent Loudness (“Leq (A)”) has also been used to quantify and control the loudness of normal spoken dialog. Leq (A) is the level of constant sound in decibels, which, in a given time period, has the same energy as a time-varying sound. The measurement is A network-weighted, which relates to the sensitivity of the human ear at low levels.
In analog programs, audio levels have been set with respect to reference levels dependent on the content of the program. Automatic gain control (“AGC”) and level matching algorithms have been implemented in hardware to adjust audio levels of analog signals as necessary. AGC cannot be used with compressed digital signals, and does not take dialog levels into account.
In the Dolby AC-3 standard, a Dialog Level (dialog normalization or “DIALNORM”) parameter is used to provide an optimum base or reference level of loudness upon which a viewer may adjust the loudness of the broadcast program with the volume control of their TV. DIALNORM is an indication of the subjective loudness of normal spoken dialog as compared to a maximum loudness (100% digital, full scale). It represents the normalized average loudness of the dialog in the program, as measured by Leq (A). DIALNORM may range from −31 decibels (“dBs”) to 0 dBs which is 100% digital. The DIALNORM value may be stored in the synchronization frame of the encoded audio stream, for example. The DIALNORM value is used by the system volume control, in conjunction with the volume set by the viewer, to establish a desired loudness (sound pressure level) of the program.
For example, the loudness of a program with a-DIALNORM of −27 dBs and a TV volume control setting of 2, for example, will sound the same to the viewer as a program, advertising or chapter with a DIALNORM of −31 dBs and a TV volume control setting of 2, even though the respective DIALNORMS are different, as long as the DIALNORM for each respective program is properly set. The user will not have to change the volume control as the programming changes from program to program or program to advertising. If programs on different channels are broadcast/transmitted at the proper DIALNORM, the volume setting would not need to be changed when the channels are changed, either.
An LM100 Level Meter, available from Dolby, may be used by sources of programming to determine the proper DIALNORM. As currently understood, the audio from a program is provided to the LM100, which is said to analyze only the dialog portion of the audio to measure program loudness based on Leq (A). The audio provided to the LM100 is not compressed. The DIALNORM value of the audio is displayed. While available, it is believed that the LM100 level meter is not being used by sources of programming to set the loudness of programs they provide.
Since there is currently no industry standard for dealing with loudness, sources of programming are free to set DIALNORM in their own way. DIALNORM is often not set or is set to a default value of −27 dBs in the Dolby Encoder, which might not be the optimum DIALNORM for a particular program. −31 dBs is often used, which is very low. Different encoders also have different settings. Because the DIALNORMs are not properly set, as channels are changed and as a program shifts to an advertising, the volume may be too loud or too soft and require adjustment by the viewer.
As mentioned above, distributors typically do not adjust the loudness of audio received from sources of programming. For example, cable systems may only manually adjust the loudness level set by an encoder, daily or weekly, if at all. This is not sufficient to provide consistent loudness between a program and the advertising included in the program, which may be provided by a different source. Audio levels have not been adjusted on a per program basis. One reason for this may be that cable systems typically broadcast programming upon receipt from a source, in real time. Audio adjustments must therefore be made in real time, as well. An LM100 could not, therefore, be used to efficiently and automatically adjust known, commercially available encoders.
Mismatched dynamic ranges can also cause loudness problems. Programs with large variation between the softest and loudest sounds (large dynamic range) are difficult to match to programs that have smaller dynamic ranges. Commercials typically have little dynamic range, to keep the dialog clear.
Dolby provides a Digital Dynamic Range Control (“DRC”) system in encoders to calculate DRC metadata based on a pre-selected DRC Profile. Profiles are provided for different types of programs. Profiles include Film Light, Film Standard, Film Heavy, Music Light, Music Standard, Speech and None. The station or content producer selects the appropriate profile. The system provides the metadata along with the audio signal in the synchronization frame, for example. The Dolby Digital Decoder can use the metadata to adjust the dynamic range of the audio signal based on the profile. Incorrect setting of the DRC can cause large loudness variations that can interfere with a viewer's listening experience. The DRC can be reduced or disabled by listeners.