1. Field of the Invention
The present invention is directed in general to receivers for processing multiple input signals; and in particular a receiver for processing incoming multimedia programming.
2. Related Art
The present invention relates to transport, storage and/or display of compressed video and audio streams. Illustratively, the invention may be used in connection with the group of audio and video coding standards developed by MPEG (“Motion Pictures Coding Experts Group”) that was published by the International Standards Organization (“ISO”) as ISO/IEC 13818, and is referred to as the “MPEG-2” standard. See ISO/IEC IS 13818-2: Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Video; ISO/IEC DIS 13818-1: Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Systems. For compressed audio there are numerous standards including ISO/IEC IS 11172-2: 1993 Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/sec—Part 3 Audio (MPEG-1 Audio); Dolby AC-3; ISO/IEC DIS 13818-3: 1994 Information Technology—Coding of Moving Pictures and Associated Audio Information: Audio (MPEG-2 Audio). However, the present invention may also be used with other coding standards currently in existence or under development, such as the MPEG-4, MPEG-7, H.263, JPEG 2000 and/or MPEG-21 standards. The contents of the ISO documents are incorporated herein by reference.
FIG. 1 depicts an illustrative communications network for encoding, transmitting and decoding an audio-video signal, such as an analog NTSC, PAL or HDTV signal, which includes audio and video component signals. The audio and video streams for a program are separately encoded by encoder 15 to produce compressed audio and video streams. Such compressed video, compressed audio and private data signals are referred to as elementary streams. As illustrated, the encoder 15 includes a video processor 10 which receives the composite video. The digitized video signal input is processed by video processor 10 and provided as an output to an audio/video processor 30. An audio processor 20 receives PCM and BTSC audio signals. The audio signal is encoded by an audio processor 20. After audio processing, audio processor 20 provides an output to the audio/video processor 30. The encoder 15 also includes a CLK generator 40, a local bus interface 50, and a test control portion 60.
With coding standards, such as the MPEG-1 and MPEG-2 standards, data streams are hierarchically organized. The compressed audio and video streams from encoder 15 are illustratively placed in a higher layer stream such as an MPEG-2 compliant program or transport stream. Each program may include multiple video streams (e.g., multiple camera views) and multiple audio streams (e.g., different language audio). The higher layer transport or program stream provides a manner for associating all related encoded video, audio and private data streams so that they can be extracted, decoded and presented together in a coherent fashion. Furthermore, the higher layer stream may include compressed audio, video and private data for multiple programs. The transport stream encoding functionality may be implemented in audio/video processor 30, which acts as a transport stream encoder/multiplexer that receives elementary streams for a number of other programs and multiplexes the elementary streams of one or more programs into one or more transport streams. The higher layer stream (i.e., program or transport stream) may be stored in a storage device such as a digital video disc (DVD), video tape, magnetic disk drive, etc. Alternatively, the higher layer stream (i.e., transport stream) is transmitted via a transmission channel 65.
Before transmitting or storing the higher layer transport or program stream, the program or transport stream may be encapsulated in an even higher layer storage format or channel layer stream with channel encoder 35. Channel encoder 35 encapsulates the one or more transport streams into one or more channel layer streams. The channel layer streams outputted by the channel encoder 35 are then transmitted via a transmission channel 65. The transmission channel 65 may be a telephone network, a cable television network, a computer data network, a terrestrial broadcast system, or some combination thereof. As such, the transmission channel may include RF transmitters, satellite transponders, optical fibers, coaxial cables, unshielded twisted pairs of wires, switches, in-line amplifiers, etc.
The transmitted channel streams are received at a decoding receiver 70. At the decoding receiver, channel decoder 75 recovers the one or more transport streams from the received channel streams. The recovered transport streams are then inputted to a transport stream decode processor 80. The transport stream decode processor 80 extracts particular elementary streams from the inputted transport streams corresponding to one or more user selected programs. In addition, control data (which may include PCR (Program Clock Recovery) data, enable data, and start up values) is extracted from the bit stream and is used to control demultiplexing the interleaved compression layers, and in doing so defines the functions necessary for combining the compressed data streams. An extracted video signal elementary stream is inputted to a video decoder 85 and an extracted audio signal elementary stream is inputted to an audio decoder 90. The video decoder 85 decodes the video signal elementary stream and outputs a decompressed video signal. The audio decoder 90 decodes the audio signal elementary stream and outputs a decompressed audio signal. Illustratively, the decompressed video signal, the decompressed audio signal and private data signal may be combined to produce an NTSC, PAL or HDTV signal. The decompressed audio and video signals are converted analog signals by Video DAC 92 and Audio AC 94.
A. Elementary Streams
To better understand the considerations associated with decoding hierarchical encoded streams of audio and video, the elementary and transport stream layers are now discussed in greater detail. This discussion below focuses largely on video elementary streams. MPEG-2 provides for compressing video by reducing both spatial and temporal redundancy. A good tutorial for MPEG-2 video compression is contained in D. Le Gall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications Of The ACM, April 1991, the contents of which are incorporated herein by reference. The encoder 80 shown in FIG. 2 includes a discrete cosine transform circuit (DCT) 83, a quantizer (Q) 85 and a variable length encoder circuit (VLC) 87. To spatially encode a picture, the picture is divided into blocks of pixels, e.g., 8×8 blocks of pixels. Each block of pixels is discrete cosine transformed to produce a number of transform coefficients. The coefficients are read out of the DCT 83 in a zig-zag fashion in relative increasing spatial frequency, from the DC coefficient to the highest vertical and horizontal spatial frequency AC coefficient. This tends to produce a sequence of coefficients containing long runs of near zero magnitude coefficients. The coefficients are quantized in the quantizer 85 which, amongst other things, converts the near-zero coefficients to zero. This produces coefficients with non-zero amplitude levels and runs (or subsequences) of zero amplitude level coefficients. The coefficients are then run-level encoded (or run length encoded) and entropy coded in the variable length encoder 87.
Blocks which are spatially encoded as described above are referred to as intrablocks because they are encoded based only on information self-contained in the block. An intra-picture (or “I picture”) is a picture which contains only intrablocks. (Herein, “picture” means field or frame in accordance with per MPEG-2 terminology).
In addition to a spatial encoder, encoder 80 includes a temporal encoder 90 for reducing temporal redundancy. Temporal coding takes advantage of the high correlation between groups of pixels in one picture and groups of pixels in another picture of a sequence of pictures. Thus, a group of pixels can be thought of as moving from one relative position in one picture, called an anchor picture, to another relative position of another picture, with only small changes in the luminosity and chrominance of its pixels. In MPEG-2, the group of pixels is referred to as a block of pixels. In temporal coding, a block of pixels, in a picture to be encoded, is compared to different possible blocks of pixels, in a search window of a potential anchor frame, to determine the best matching block of pixels in the potential anchor frame. This is illustrated in FIG. 3a. A motion vector MV is determined which indicates the relative shift of the best matching block in the anchor frame to the block of the picture to be encoded. Furthermore, a difference between the best matching block and the block in the picture to be encoded is formed. The difference is then spatially encoded as described above.
Blocks which are temporally encoded are referred to as interblocks. Interblocks are not permitted in I pictures but are permitted in predictive pictures (“P pictures”) or bi-directionally predictive pictures (“B pictures”). P pictures are pictures which each only have a single anchor picture, which single anchor picture is presented in time before the P picture encoded therewith. Each B picture has an anchor picture that is presented in time before the B picture and an anchor picture which is presented in time after the B picture. This dependence is illustrated in FIGS. 3a and 3b by arrows.
With such coding, pictures may be placed in the elementary stream in a different order than they are presented. For instance, it is advantageous to place both anchor pictures for the B pictures in the stream before the B pictures which depend thereon (so that they are available to decode the B pictures), even though half of those anchor pictures will be presented after the B pictures. While P and B pictures can have interblocks, some blocks of P and B pictures may be encoded as intrablocks if an adequate matching block cannot be found.
As seen from the foregoing, temporal encoding requires that the blocks of the anchor pictures be available for use in encoding. Thus, blocks which have been discrete cosine transformed and quantized are dequantized in inverse quantizer 91 and inverse discrete cosine transformed in inverse discrete cosine transform circuit 92. The reproduced blocks of pixels of the anchor pictures are stored in picture memory 94. If necessary to reconstruct the reproduced block of an anchor picture (e.g., a P picture), a previous block of pixels from a previous picture is added to the decoded block of pixels outputted by the IDCT 92 using adder 93.
During motion estimation coding, picture memory 94 outputs one or more search windows of pixels of the anchor pictures stored therein to motion estimator 95 which also receives an inputted macroblock of a picture to be temporally encoded. The motion estimator determines the best matching macroblock(s) in the search window(s) to the inputted macroblock and the motion vector(s) for translating the inputted macroblock to the best matching macroblock(s). The best matching macroblock(s) is subtracted from the inputted macroblock in subtractor 96 and the difference is spatially encoded by the spatial encoder 80. The motion vector is variable length encoded and multiplexed with the spatially encoded difference macroblock.
The amount of image compression during encoding varies by picture type. For example, I pictures often require significantly more bits than P and B pictures. In addition, the sequence of encoding inputted video pictures as I, P or B pictures can be arbitrary, or can follow a predetermined pattern, such as the MPEG-2 standard. As a consequence, while the encoded video elementary stream has a nominal average bit rate, the instantaneous bit rate can fluctuate. In contrast, the audio bit stream has a relatively constant bit rate. Differences in these bit rates require that the transport stream provide a mechanism for ensuring that both audio and video are timely presented so as to synchronize the video and audio of a particular program.
An additional timing issue arises by virtue of the fact that only I pictures can be independently decompressed, but that for decoding of P and B pictures, the anchor frames, on which they depend, must also be decompressed. This requires that the anchor pictures be decompressed in a timely fashion. As discussed below, the transport stream provides a mechanism for ensuring that anchor pictures are timely decoded.
B. Transport Streams
MPEG-2 provides two higher layer streams called the program stream and the transport stream. This invention is explained in the context of the transport stream because most storage and transmission uses of MPEG-2 compressed video and audio use the transport stream, thought the scope of the discussion should be sufficiently general for application to program streams and to recording and reproduction of program or transport streams using a storage device.
According to the MPEG-2 standard, the data of each digital elementary stream is first placed into program elementary stream (PES) packets, which may have an arbitrary length. The PES packet data, and other data, relating to one or more programs may be combined into one or more transport streams. The transport stream is organized into fixed length packets, each of which includes a four byte header and a 184 byte payload.
Each transport packet can carry PES packet data, e.g., private data or video or audio data (e.g., which may be compressed and formed into streams according to MPEG-2 syntax), or program specific information (PSI) data (described below). Private data may be placed in optional adaptation fields in the transport packet. Transport packets may not contain both PES packet data and PSI data. Furthermore, transport packets may only contain PES packet data for a single elementary stream.
Each transport packet is assigned a packet identification code or PID, which acts as a label for the transport packet so that all packets with a particular PID have related contents, e.g., all have particular PSI data, all have PES packet data for a particular elementary stream, etc.
In addition, PES bearing packets may also contain program clock reference (PCR) values, presentation time stamps (PTS) and decoding time stamps (DTS). The PCR is a snapshot of the encoder clock at the encoders which produced the elementary streams of a particular program. Since elementary streams for multiple programs produced at different times may be multiplexed into the same transport stream, it is not unusual to have divergent PCR values for the elementary streams associated with different programs.
The PTS, typically included in the PES header, indicates the time when a video picture or audio frame should be presented (i.e., displayed on a monitor or converted to sound on a loudspeaker) relative to the encoder clock (PCR) of the encoders which produced the video and audio. PTS's enable the synchronization of video and audio of a particular program despite the lack of instantaneous correlation between the video and audio bit rates.
The DTS indicates the time when a video picture should be decoded relative to the encoder clock. DTS's enable the timely submission of compressed anchor video pictures to the encoder for use in decoding interceded pictures which depend thereon.
Program specific information (PSI) data includes information other than elementary stream data which is necessary to decode the PES packet data, such as information for identifying which of plural transport streams contains the information for a specific program, information for locating elementary streams associated with specific programs, and conditional access information. For example, in networks wherein multiple transport streams are received, a Network Information Table (NIT) may be transmitted in each transport stream to indicate which programs are carried in each transport stream. Where program streams are modulated onto different “frequency channels” or carriers, the NIT also indicates on which carrier each transport stream is modulated. Thus, to identify the transport stream containing a program of interest, one need only access the NIT on any one of the transport streams.
Because each transport packet can only carry PES packet data for one elementary stream, the PSI is provided with a program association table (PAT) and program mapping table (PMT) to reconstitute elementary streams of a single program. The PMT, in turn, actually correlates all of the related elementary streams of each program for which the PMT contains an entry. In addition to program number and PID information, each PMT entry includes other information such as, the PID of the packets containing the PCR's for this program, the type of each stream, i.e., audio, video, etc.
C. Decode Timing
As seen from the foregoing, clock recovery and synchronization can pose significant challenges to the decompression of audio-video bit streams. For example, in the MPEG-2 standard, the Program clock reference (PCR) and the presentation and decoding time stamps (PTS/DTS) are used to re-assemble the elementary streams into a program. While the encoder and decoder will typically use system clocks having the same frequency (27 MHz in the case of MPEG-2 coders), a variety of transmission events can cause the decoder clock to lose synchronization with the encoder clock. As a result, the decoder clock can not be allowed to free run.
In the MPEG-2 standard, the encoder and decoder clocks are synchronized by the Program Clock Reference data field in the packet adaptation field of the PCR PID for the program, which is used to correct the decoder clock. The PCR is a 42 bit field that a PCR Base having a 33-bit value in units of 90 kHz, and a PCR extension having a 9-bit extension in units of 27 MHz, where 27 MHz is the system clock frequency. In operation, the first 42 bits of the first PCR received by the decoder are used to initialize the counter in a clock generator, and subsequent PCR values are compared to clock values for fine adjustment. In conventional systems, the difference between the PCR and the local clock can be used to drive a voltage controlled oscillator to speed up or slow down the local clock
The presentation time stamp (PTS) and decoding time stamp (DTS) used to synchronize the audio and video by indicating the time that the presentation unit should be presented to the user. The PCR and PTS/DTS timing information arrives at the decoder at predetermined intervals, where the PCR arrives about every 10-100 milliseconds and about every 700 milliseconds for the PTS/DTS. In conventional decoders, the PCR and the local clock values are used to drive a voltage controlled oscillator which speeds up or slows down the local clock.
D. Technology Implementation
In addition to the complexity of the computational requirements for compressing and decompressing audio-visual information, such as described above, the ever-increasing need for higher speed communications systems imposes additional performance requirements and resulting costs for video processing systems. In order to reduce costs, communications systems are increasingly implemented using Very Large Scale Integration (VLSI) techniques. The level of integration of communications systems is constantly increasing to take advantage of advances in integrated circuit manufacturing technology and the resulting cost reductions. This means that communications systems of higher and higher complexity are being implemented in a smaller and smaller number of integrated circuits. For reasons of cost and density of integration, the preferred technology is CMOS.
Digital Signal Processing (“DSP”) techniques generally allow higher levels of complexity and easier scaling to finer geometry technologies than analog techniques, as well as superior testability and manufacturability. However, DSP based communications systems require, for their implementation, an analog-to-digital converter (“ADC”). In many applications, the ADC is challenging to design, especially where critical clock signals must be generated at ever-increasing frequencies.
Conventional communications systems have derived chip clocks from external voltage controlled crystal oscillators (“VCXOs”) controlled by pulse width modulated (“PWM”) waveforms. Such systems are expensive to manufacture and assemble, and also have limited achievable accuracy, especially where automatic frequency control (AFC) loop techniques are used.
Conventional decoder systems are able to decode and display a single encoded program at a time. As a result, a decoder system, such a set-top box, is required for each display device. Alternatively, multiple decoder systems are required to simultaneously (1) receive and decode transmitted programs and (2) playback a locally stored program. There is a need to provide a decoding system for processing multiple signals for simultaneous display on one or more display devices. Conventionally known systems do not allow a user to perform such multitude of functions without increasing circuit area and power requirements. In fact, most conventional systems require separate set-top box decoders for each display device. Therefore, there is a need for a better system that is capable of performing the above functions without increasing circuit area and operational power.
Further limitations and disadvantages of conventional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings and detailed description which follow.