1. Field of the Invention
The present invention relates to a media data processing apparatus and a media data processing method, and more particularly to a synchronizing control for a plurality information data sets.
2. Related Background Art
In recent years, a variety of methods for distributing multimedia data including texts, moving picture (video), voice (audio), and the like via a network in real time have come in practice.
FIG. 1 shows a structure of a conventional multimedia processing section on a receiving side. Referring to FIG. 1, a receiver section 101 receives video and audio bitstreams through a transmission path. The bitstreams will be described below in greater detail. A demultiplexer 102 separates the received bitstreams into video bitstreams and audio bitstreams. A video decoder 103 decodes the separated video bitstreams. An audio decoder 105 decodes the separated audio bitstreams.
A synchronizing control section 104 receives video and audio synchronizing control data (time stamps) from the demultiplexer 102, and performs a synchronizing control for reproducing both of the data. The time stamps will be described below in greater detail.
A video reproducing control section 106 performs a video reproduction to provide users with images on a video reproducing device (i.e., a display) 108. An audio reproducing control section 107 performs an audio reproduction to provide users with sounds to be produced on an audio reproducing device (i.e., speakers) 109.
FIG. 2 shows a structure of a conventional multimedia processing section on a transmission side unit. Referring to FIG. 2, a video processing section 201 receives inputs of video sources, and performs pre-processings such as appropriate conversion (for example, A-to-D conversion) and encoding on the inputted video sources. An audio processing section 202 receives inputs of audio sources, and performs pre-processings such as appropriate conversion and encoding on the inputted audio sources. A video encoder 203 encodes video signals. An audio encoder 205 encodes audio signals.
A synchronizing control section 204 controls to synchronize the video encoder 203 and the audio encoder 205. A multiplexer 206 inserts header information, time stamps and the like to multiplex the media data.
The multiplexed bitstreams are stored in a storage 207, which may be a hard disk (HD) that stores multimedia data. In response to requests from a client, a distribution server 208 distributes bitstreams to the receiver section 101 at the client.
FIG. 3 shows a structure of multiplexed bitstreams. Normally, data is compressed (encoded) by an optimum method, and the compressed data format is called a “bitstream”. Also, when video and audio multimedia data (which may be simply referred to below as “multimedia data”) are simultaneously transmitted, each of the data is divided into segments each having a specified size (which is called a “packet”) and superposed with the other, as indicated in FIG. 3.
In FIG. 3, the multimedia data includes header information 307 which is added to each packet, video bitstreams 308, 310 and 312, and audio bitstreams 509, 511 and 513.
The multimedia data also includes time stamps 301-306 which are time information. The time stamps 301, 303 and 305 are time stamps for the respective video bitstreams that immediately follow. Also, the time stamps 302, 304 and 306 are time stamps for the respective audio bitstreams that immediately follow. Also, configuration data of the time stamp may be included in header information for each packet.
FIG. 4 shows a structure of a multimedia processing section at the receiving side in MPEG-4 standard. In ISO/IEC SC29/WG11 4496-1 (MPEG-4 Systems) standard, formats of bitstreams and methods for synchronizing video data and audio data are standardized. Also, MPEG-4 standard specifies intellectual property management methods. In MPEG-4, Binary Format for Scene description (BIFS) is used as a method for scene description, and data are transmitted as a single media stream.
In FIG. 4, bitstreams are separated by a demultiplexer 402 and stored in buffers 403-406 for decoding BIFS, each media, and Intellectual Property Management and Protection (IPMP) information, respectively; and decoding processings for the respective bitstreams are performed by decoders 407-409, respectively, in the succeeding stage. In this instance, scene information is composed by a scene tree generation section 411 with the BIFS stream, and information required for IPMP processings (for example, an encryption key) is decoded with the IPMP stream.
An IPMP control section 410 controls operations of the decoders and a compositor 412. More specifically, based on the decoded IPMP information, the IPMP control section 410 performs on or off controls for decoding and composing each of the media, or performs restricted reproduction controls. A renderer 413 outputs the media to a display 414 and speakers 415 for reproducing video and audio signals.
FIG. 5 shows an example of time stamps in a conventional video processing in MPEG-4 Systems. In this MPEG-4 Systems, two kinds of time stamps called DTS (Decode Time Stamp) and CTS (Composition Time Stamp) are defined.
Each DTS shown in FIG. 5 indicates a processing start time by each of the decoders, and each CTS indicates a composition processing start time. Video AU1, Video AU2, Video AU3, . . . indicate consecutive video access units (in units of decoding), respectively, and DTS and CTS corresponding to each of the video access units Video AU1, Video AU2, Video AU3, . . . are transmitted. However, there may be an occasion where one of DTS and CTS may be omitted. In this case, time stamps are treated as being DTS=CTS.
FIG. 5 also shows in its lower section IPMP information IPMP AU1, IPMP AU2, IPMP AU3, . . . , which are time stamps of IPMP data with which the video bitstreams are processed. The IPMP information IPMP AU1, IPMP AU2, IPMP AU3, . . . , correspond to the video access units Video AU1, Video AU2, Video AU3, . . . , respectively.
In this manner, the IPMP data is also treated as one kind of bitstreams like other media streams such as video and audio bitstreams, and are subject to reproduction control at each stage on the receiving side. For example, information concerning an encryption key may be embedded in the IPMP data, and the encryption key may be updated (for each of the access units) according to the time stamps.
However, the above example of conventional art has the following problems.
1. Time stamps added to IPMP streams are treated in the same manner as other media streams, and handling of composition time stamps (CTS) for IPMP and their operations are not defined.
Here, the composition is an act that is rendered on media streams, and IPMP streams do not characteristically render composition on themselves.
2. Synchronization relations between time stamps added to IPMP streams and time stamps added to other media streams are not defined, such that media data and IPMP data cannot be synchronized in units of access units.
3. In order to hide the information concerning an encryption key, the information concerning the encryption key may be subject to an encryption processing or may be embedded in IPMP data by using an electronic watermark technique. However, in these cases, the IPMP data need to be processed by different processing methods, such that the processings of the IPMP data cannot be flexibly expressed.
The methods for processing IPMP data are different from each other as described above because of the following reasons. When the encryption technique is used, the IPMP data is first decrypted and then the decrypted data is decoded to extract the information concerning the encryption key. On the other hand, when the electronic watermark technique is used, the information is first decoded, and the information concerning the encryption key is extracted from the decoded data.