The present invention relates to decoding video and audio data which are multiplexed after compression coding, and more particularly to a receiver/decoder for receiving video and audio data compression encoded by high efficiency coding means and decoding the received encoded data.
In order to reduce a transmission/record cost of a large amount of video data, the video data is compression encoded by high efficiency coding means which removes redundancy and thereafter the data is transmitted or recorded. An example of high efficiency coding means is the well known MPEG method standardized by ISO/SC29/WG11. In the MPEG method, an encoded data multiplexing method as well as the video and audio data encoding method are standardized as MPEG/Systems.
In the fields of broadcasting and communications, in recent years, redundancy of moving picture video data is removed to compress the video data which is then transmitted in the form of digital signals. For video data compression, discrete cosine transformation (DCT) and motion compensation prediction coding such as one stipulated in MPEG specifications are generally performed. With high compression ratio of video data, a plurality of broadcast programs can be transmitted in a multiplex manner over a single transmission channel. A program means a set of video data and its associated audio and/or text data.
Multiplexing of a plurality of programs according to the MPEG specifications is stipulated in ITU-T Rec. H.222.0, ISO/IEC13818-1, 1994 (E), "Information Technology--Generic Coding of Moving Pictures and Associated Audio--Part 1: Systems", pp. 9-21, wherein it is described that a transport stream (hereinafter abbreviated as TS) packet is used for such multiplexing which has a fixed length of 188 bytes. The structure of an apparatus called a set top box is shown in the block diagram of FIG. 2. This apparatus separates TS packets formed in accordance with the MPEG specifications and supplied from a broadcast station or the like, and supplies them to a video decoder and an audio decoder to output decoded video and audio signals. Conventional techniques will be described with reference to FIG. 2.
A tuner 1 selects data of one channel transmitted from a communications medium such as a CATV and a communications satellite, and supplies the selected channel data to a demodulator 2. The demodulator 2 demodulates the channel data which was channel encoded by QAM, QPSK, or the like, performs an error correction process by using redundancy codes, and supplies the processed data to a demultiplexer 3. The processed data is a bit stream data of the TS packet format which is shown in FIGS. 3A and 3B. The formats are classified into two types shown in FIGS. 3A and 3B according to the contents of a TS packet. The format shown in FIG. 3A is used for transmitting program elements, including video data, audio data, text data such as teletext, and other data. Each TS packet of 188 bytes is constituted by a transport stream header (abbreviated as TS header) and a payload containing elements described above. The TS header always contains a packet ID (abbreviated as PID) representative of the attribute of the TS packet, and sometimes contains a program clock reference (abbreviated as PCR) which is time information used for a decoder to recover a system clock used as a time reference when each element was encoded. The payload is part of a packetized elementary stream (PES). The PES packet is a variable length packet constituting an element unit determined by each element, the type of data storage media, and the like. The PES packet is constituted by data of each element and a PES header. The PES header includes a stream ID for describing the element contents, time stamp information (PTS) for describing a PES packet length and a time when the element is displayed, and other pieces of information. The element unit designated by PTS is called an access unit. For example, one picture is the unit for video data, and one frame is the unit for audio data. The format shown in FIG. 3B is used for program specific information (abbreviated as PSI) which is additional information used for the system control. The payload of the TS packet is part of PSI described in units of sections. Each section is constituted by a section header, PSI, and cyclic redundancy check (CRC) used for error correction. The section header indicates the attribute of the succeeding PSI and a section length. PSI is structured hierarchically and contains information necessary for the system control, such as program association table (PAT) describing program information (specifically, PID of PMT to be described later) contained in bit stream data transmitted as TS, and a program map table (PMT) describing a correspondence between the element and PID in each program. The demultiplexer 3 shown in FIG. 2 receives a TS packet, and supplies PSI data via a data bus to a system decode buffer allocated in a RAM 7, and video and audio data which are elements constituting a program selected by a user, respectively to a video decoder 8 and an audio decoder 10. The demultiplexer 3 samples time information from the header of the TS packet containing PCR, and supplies control signals to a clock generator 4 to recover the system clock. A microprocessor 12 decodes the contents of PSI data sent to the system decode buffer in RAM 7 and stores the results in a system decode buffer 73 in RAM 7 in the data format usable by system control software programs. Software programs are generally stored in the same RAM or in a dedicated ROM. The work area of software programs is generally developed in RAM 72. In response to an instruction entered by a user via a user interface unit 13, the microprocessor 12 supplies PID which is used for deriving the TS packet of a program, to the demultiplexer 3, by using the decoded PSI data, and also supplies a control signal which is used for selecting the program, to the tuner 1. The video decoder 8 and audio decoder 10 decode the video and audio data in cooperation with a video decode buffer 9 and an audio decode buffer 11. Data transmission speed on a transport channel is different from the bit rate used when each element was encoded, because of multiplexing of programs. Therefore, if data is supplied at the data transmission speed directly to the video decoder 8 and audio decoder 10, the video decode buffer 9 and audio decode buffer 11 may locally overflow or underflow and video and audio outputs are disturbed. It is therefore necessary to insert packet transport buffers 5 and 6 between the demultiplexer and decoders to convert the bit rates in accordance with the capacities of the decoder buffers, and thereafter to supply element data to the video decoder 8 and audio decoder 10. Multiplexing according to the MPEG specifications assume that a packet transport buffer having a capacity of 512 bytes per each element is provided.
The present inventors have found out problems of an increased number of system components and an increased cost to be caused by the provision of the packet transport buffers 5 and 6 independently from the system memory. Even if the packet transport buffers are implemented in the demultiplexer 3, the increased circuit scale and an increased cost thereof are inevitable.
The MPEG specifications standardize not only a video and audio encoding method but also a coded data multiplexing method, as the MPEG/Systems.
FIG. 24 shows the structure of a transport packet (hereinafter called a TS packet) adopted by the transport system which represents one multiplexing method of the MPEG/Systems.
Each TS packet is a fixed length packet of 188 bytes including a 4-byte header and a 184-byte payload. The 4-byte header is constituted by a sync byte, an error flag, a unit start flag, a scramble control flag, a priority flag, a set of PID data, an adaptation field control flag, and a cyclic counter. The contents of PID data identify the payload of the TS packet.
The payload is constituted by an adaptation field (hereinafter abbreviated as AF) designated by the adaptation field control flag and a payload having video and audio encoded data and multiplexing information data. AF is constituted by AF length data, a discontinuity flag, a random access flag, a priority flag, a PCR flag indicating a presence/absence of AF optional data, an OPCR flag, a splice point flag, a private data flag, and optional data designated by these flags. Of the optional data, PCR data is used for clock synchronization between the transmission and reception sides, and the reception side generates a reference clock of 27 MHz by using a phase locked loop.
FIG. 25 illustrates selection of a particular channel of encoded video and audio data from a bit stream (TS packets) with a plurality of programs (hereinafter called channels) multiplexed. A program association table (hereinafter abbreviated as PAT) is always transmitted by a TS packet having PID="0". PAT contains multiplexing information of a plurality of channels and supplies information on whether each channel program map table (hereinafter abbreviated as PMT) is transmitted by a TS packet having what PID value.
FIG. 25 shows an example of multiplexing of three channels. In order to select a j-th channel and obtain its PMT, the TS packet with PID=Mj is captured. This PMT has information that the video data is transmitted by the TS packet with PID=Aj and the audio data is transmitted by the TS packet with PID=Vj. Therefore, encoded video data can be received by capturing the TS packet with PID=Vj and encoded audio data can be received by capturing the TS packet with PID=Aj.
The MPEG/Systems described above, however, define only syntax of TS packets and the like and does not determine how a reception apparatus is actually configured. In addition, it does not provide particular definition of multiplexing of value added services such as program guidance, which definition is relied upon arbitrary decision on the side of broadcasting companies or program vendors.
In conventional technique disclosed in JP-A-4-229464 (corresponding to EP-0460751A2), each packet has time information which is used on the decode side as a reference of various timings including a decode timing.