In broadcasting multimedia, when a terminal device starts to implement access or changes channels, in order to obtain first data, which may be reproduced, the terminal device must wait for a time period of a head of the data, which may be reproduced or at least one complete reproduction frame. Conventional broadcasting methods utilize Moving Pictures Experts Group 2-Transport Stream (MPEG2-TS) technology and transmit control head information via packets corresponding to different digital television technology standards, such as Program Specific Information (PSI) packets associated with MPEG2-TS technology, Service Information (SI) packets associated with Digital Video Broadcast (DVB) standards, and Program and System Information Protocol (PSIP) packets associated with Advanced Television Systems Committee (ATSC) standards. At present, an increasingly popular Dynamic Adaptive Streaming over Hypertext Transfer Protocol (HTTP) (DASH) technology transmits an MP4 packet head as a separate fragment (a first fragment) for use as a decoding reference for subsequent data.
The above two methods have their limitations. For example, the terminal device is only able to decode and reproduce audio and video data after receiving the control information. That is, when the terminal device starts to implement access or changes channels, the wait time for reproduction of the program (i.e., the amount of time before the terminal device begins reproduction of the program) is determined based on when the terminal device receives the control information. Generally, wait time for the video to be produced is long. When standard definition video is processed with the MPEG2-TS technology, a theoretical value of the wait time is 1.4 seconds (implemented with hardware). When high definition video or ultra-high definition video is processed, the wait time may multiply. With the DASH technology, the current wait time is 4.5 seconds. From the user perspective, the wait time for reproduction is too long. Especially, in the case in which the user is continuously changing channels (i.e., selecting between channels), the user experience is very poor.