As a method for integrating contents including text, still image, video and speech and describing their spatial and temporal arrangements, a technology called “SMIL (Synchronized Multimedia integration Language)” which is being standardized by the W3C (World Wide Web Consortium) is currently available.
SMIL is a description language similar to a hyper text description language HTML which is currently widely spread over the Internet and is a description language suitable for distribution of multi-media data including video.
A description example of an SMIL file will be explained using FIG. 1.
Information from <layout> on the third line to </layout> on the eighth line of the description shown in FIG. 1 corresponds to information on a spatial layout of contents.
Information from <par> on the 11th line to </par> on the 16th line corresponds to time information on representation of the contents.
Regions v, t and i in which video, text and still image are arranged are defined from the fifth to seventh lines.
The 12th to 14th lines define time information on representation of the video, speech, text and still image respectively. “src=” included in the 12th to 14th lines specifies a URL for acquiring the media and in this example, it specifies that video and speech are acquired using a RTSP (RealTime Streaming Protocol, Internet Draft RFC2326) protocol, while text and still image are acquired using an HTTP protocol.
Furthermore, “region=” included in the 12th or 14th, 15th line specifies the position at which the media is displayed and corresponds to the regions specified on the fifth to seventh lines.
For example, since the text data specified on the 14th line has region id=“t”, the text data is displayed in the region specified on the sixth line.
The line number is given for convenience of explanation and is not described in an actual SMIL file.
Next, the method whereby a client represents contents described in SMIL saved on a server over a network will be explained using FIG. 2.
A client 704, a terminal which receives contents uses a protocol such as HTTP to acquire an SMIL file describing contents from a server 1 (701) over a network such as the Internet. After acquiring the SMIL file, the client 704 interprets the SMIL file and acquires various media described therein, that is, text, still image, video, speech, etc., from the server.
More specifically, the client 704 acquires video data and speech data from the server 2 (702) and acquires text data and still image data from the server 3 (703).
Then, based on space information and time information described in the acquired SMIL file, the client 704 represents the respective described media at appropriate positions and appropriate times.
However, when contents are described using SMIL, the client 704 cannot know the type of the multi-media data described in the SMIL file beforehand.
Furthermore, depending on the capability of representing the multi-media data of the client 704, there is a possibility that all types of the multi-media data described in the SMIL file may not be decoded.
In order to solve this problem, a method whereby the client 704 acquires a decoder corresponding to the capability of representing the multi-media data is proposed (e.g., the method described in the Unexamined Japanese Patent Publication No. 2002-297538).
According to this method, even when the reception terminal does not have the capability of representing the multi-media data specified by scenario data of SMIL, etc., the reception terminal is allowed to acquire the decoder as appropriate. This allows the reception terminal to decode all estimated types of multi-media data.
However, the above described media distribution method involves the following problems.
The above described media distribution method does not take any transmission state of the multi-media data into consideration at all. That is, no consideration is given to a case where the reception terminal cannot receive the multi-media data.
Especially, when media are distributed through a radio transmission path, it may or may not be possible to transmit the multi-media data specified by SMIL to the reception terminal depending on the band and error rate of the varying communication path.
For example, in a third-generation cellular phone system, a reception terminal located far from a base station may be able to receive media data at a low bit rate, while a reception terminal located close to a base station may be able to receive media data at a high bit rate. In such a transmission path, there may be a case where the reception terminal located far from the base station has difficulty in receiving large-size data such as video data and can only receive media such as text and still image.
In this way, when the multi-media data specified by SMIL includes data that cannot be received, even if the reception apparatus can decode all types of multi-media data specified by the transmitting side, there is a problem that it is not possible to display the contents specified by SMIL appropriately.