1. Field of the Invention
The present invention relates to transmission apparatuses and transmission methods which are suitable for delivering, via a network or a recording medium, scene description data for forming a scene using multimedia data including still image signals, moving image signals, audio signals, text data, and graphic data, the scene description data being in turn received, decoded, read, and displayed by a receiving terminal.
2. Description of the Related Art
FIG. 15 shows the configuration of a conventional data delivery system for transmitting moving image signals and audio signals through a transmission medium, the signals being in turn received, decoded, and displayed by a receiving terminal. In the following description, a moving image signal or an audio signal which is coded in conformity with the ISO/IEC 13818 standard (so-called MPEG 2) or the like is referred to as an elementary stream (ES).
Referring to FIG. 15, an ES processor 13 of a server 100 selects an ES which is stored beforehand in a storage device 104 or receives a baseband image or an audio signal (not shown) and encodes the ES or the received signal. A plurality of ES's may be selected. If necessary, a transmission controller 105 of the server 100 multiplexes a plurality of ES's and subjects them to transmission coding in accordance with a transmission protocol for transmitting signals over a transmission medium 107. The coded signals are transmitted to a receiving terminal 108.
A reception controller 109 of the receiving terminal 108 decodes the signals transmitted through the transmission medium 107 in accordance with the transmission protocol. If necessary, the transmission controller 109 separates the multiplexed ES's and passes each ES to a corresponding ES decoder 112. The ES decoder 112 decodes the ES, reconstructs the moving image signal or the audio signal, and transmits the reconstructed signal to a display/speaker 113 which includes a television monitor and a speaker. Accordingly, the television monitor displays images, and the speaker outputs sound.
For example, the server 100 is a transmission system of a broadcasting station in broadcasting or an Internet server or a home server on the Internet. For example, the receiving terminal 108 is a receiving apparatus for receiving broadcast signals or a personal computer.
When a transmission bandwidth of a transmission path (transmission medium 107) for transmitting an ES changes or when the state of traffic congestion changes, data to be transmitted may be delayed or lost.
In order to solve the above problems, the data delivery system shown in FIG. 15 performs the following processing.
The server 100 (for example, the transmission controller 105) assigns a serial number (coded serial number) to each packet of data to be transmitted over the transmission path. At the same time, the reception controller 109 of the receiving terminal 108 performs a completeness check to see whether or not there is a missing serial number (coded serial number) assigned to each packet received from the transmission path, thereby detecting data loss (data loss ratio). Alternatively, the server 100 (for example, the transmission controller 105) adds time information (coded time information) to data to be transmitted over the transmission path. At the same time, the reception controller 109 of the receiving terminal 108 monitors the time information (coded time information) added to the data received from the transmission path, thereby detecting transmission delay. The reception controller 109 of the receiving terminal 108 detects the data loss ratio of the transmission path or transmission delay and transmits (reports) the detected information to a transmission state detector 106 of the server 100.
The transmission state detector 106 of the server 100 detects the transmission bandwidth of the transmission path or the traffic congestion state from the data loss ratio of the transmission channel or the information indicating the transmission delay which is transmitted from the reception controller 109 of the receiving terminal 108. Specifically, the transmission state detector 106 determines that the transmission path is congested if the data loss ratio is high. If the transmission delay is increased, the transmission state detector 106 determines that the transmission path is congested. If a reserved-band-type transmission path is used, the transmission state detector 106 can directly detect the free bandwidth (transmission bandwidth) available for the server 100. When a transmission medium such as radio waves which are greatly influenced by climate conditions is used, the transmission bandwidth may be preset by a user in accordance with climate conditions and the like. The information about the transmission state, which is detected by the transmission state detector 106, is transmitted to a conversion controller 101.
Based on the detected information such as the transmission bandwidth of the transmission path or the traffic congestion state, the conversion controller 101 enables the ES processor 103 to select an ES having a different bit rate. When the ES processor 103 performs encoding in compliance with the ISO/IEC 13818 standard (so-called MPEG 2) or the like, the conversion controller 101 adjusts the coding bit rate. In other words, when it is detected that the transmission path is congested, the conversion controller 101 enables the ES processor 103 to output an ES having a low bit rate. Thus, data delay can be avoided.
For example, the system configuration may include an unspecified number of receiving terminals 108 connected to the server 100. When the receiving terminals 108 have different specifications, the server 100 must transmit an ES to the receiving terminals 108 which have various processing capacities. In such a case, the receiving terminals 108 each include a transmission request processor 110. The transmission request processor 110 generates a transmission request signal for requesting an ES which complies with the processing capacity thereof, and the transmission request signal is transmitted from the reception controller 109 to the server 100. The transmission request signal includes a signal that indicates the capacity of the receiving terminal 108 itself. For example, signals which are transmitted from the transmission request processor 110 to the server 100 and which indicate the capacity of the receiving terminal 108 include memory size, resolution of a display, computing capacity, buffer size, coding format of each decodable ES, the number of decodable ES's, bit rate of each decodable ES, and the like. The conversion controller 101 of the server 100 that has received the transmission request signal controls the ES processor 103 so that an ES that complies with the performance of the receiving terminal 108 is transmitted. Concerning an image signal converting process performed by the ES processor 103 to convert the ES so that the ES complies with the performance of the receiving terminal 108, an image signal converting method is proposed by the assignee of the present invention.
In conventional television broadcasting, one scene basically consists of an image (only still image or moving image) and sound. A display screen of a conventional receiving apparatus (television receiving set) displays only images (still images or moving images), and a speaker only outputs sound.
Recently, one scene has been formed using multimedia data including various signals such as still image signals, moving image signals, audio signals, text data, and graphic data. Methods for describing the structure of a scene using such multimedia data include HTML (HyperText Markup Language) used in home pages on the Internet, MPEG-4 BIFS (Binary Format for Scenes) which is a scene description system defined by the ISO/IEC 14496-1 standard, and Java (trademark). In the following description, data that describes the structure of a scene is referred to as a scene description. As in text data in HTML, an ES may be included in a scene description. HTML is defined by the W3C (World Wide Web Consortium) Recommendation.
The conventional data delivery system shown in FIG. 15 can form and display a scene in accordance with the scene description.
However, the conventional scene delivery system is designed to decode and display the scene structure based on the same scene description even when the bit rate of the ES is adjusted in accordance with a change in the transmission bandwidth of the transmission path or a change in the traffic congestion state or in accordance with the performance of the receiving terminal. In other words, the conventional data delivery system performs decoding and display using the same scene structure regardless of the fact that the ES is to be converted by the ES processor 103.
As described above, according to the conventional data delivery system, when the state of the transmission path for transmitting the ES (transmission bandwidth or traffic congestion state) or the processing capacity of the receiving terminal 108 is not sufficient, the bit rate of the ES is adjusted in accordance with the state of the transmission path or the request from the receiving terminal 108 in order to avoid transmission data delay or data loss. Specifically, for example, the ES processor 103 selects an ES having a specific bit rate from among a plurality of ES's having different bit rates. When the ES processor 103 performs coding in compliance with the ISO/IEC 13818 standard (so-called MPEG 2), the coding bit rate is adjusted. Since the conventional data delivery system has neither judgment criteria for selecting a specific ES from among a plurality of ES's nor judgment criteria for adjusting the coding bit rate, an optimal ES cannot be obtained in accordance with the state of the transmission path or the processing capacity of the receiving terminal 108.
When scene description data is to be delivered over a transmission path such as the Internet in which the transmission capacity is variable and the transmission bandwidth varies in accordance with time or the path, or when an unspecified number of receiving terminals are connected to a server and when scene description data is to be delivered to the receiving terminals which have different specifications and various processing capacities, it is difficult for the conventional data delivery system to detect in advance the optimal scene structure for the transmission path and the receiving terminal. When a decoder of the receiving terminal is formed by software, or when the decoder software and the other processing software share a CPU or memory, the processing capacity of the decoder may vary dynamically. In such a case, the conventional data delivery system cannot detect in advance the optimal scene description. In the conventional data delivery system, when converting an ES or selecting and transmitting an ES from among a plurality of ES's in accordance with the state of the transmission path or the request form the receiving terminal 108, the receiving terminal 108 cannot perform display using the optimal scene structure with respect to the ES transmitted from the server 100. Although not shown in FIG. 15, instead of decoding and displaying data delivered through the transmission medium 107, when a decoder/display terminal for reading, decoding, and displaying an ES recorded in a recording medium or a recording device is used, as in the above description, display cannot be performed using the optimal scene structure in accordance with the processing capacity of the decoder/display terminal.