This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
The file format is an important element in the chain of multimedia content production, manipulation, transmission and consumption. There is a difference between the coding format and the file format. The coding format relates to the action of a specific coding algorithm that codes the content information into a bitstream. In contrast, the file format comprises a mechanism for organizing the generated bitstream in such way that it can be accessed for local decoding and playback, transferred as a file, or streamed, all utilizing a variety of storage and transport architectures. Additionally, the file format can be used to facilitate the interchange and editing of the media. For example, many streaming applications require a pre-encoded bitstream on a server to be accompanied by metadata (stored in “hint-tracks”) that assists the server in streaming the video to the client. A hint track does not contain media data, but instead contains instructions for packaging one or more tracks into a streaming channel.
Available media file format standards include the International Organization for Standardization (ISO) base media file format (ISO/International Electrotechnical Commission (IEC) 14496-12) (also referred to as the ISO file format in short), the Moving Picture Experts Group (MPEG)-4 file format (ISO/IEC 14496-14), the Advanced Video Coding (AVC) file format (ISO/IEC 14496-15) and the 3rd Generation Partnership Project (3GPP) file format (3GPP TS 26.244). Efforts are also underway in MPEG for the development of the Scalable Video Coding (SVC) file format and the Multiview Video Coding (MVC) file format, which are expected to become two amendments to the AVC file format.
The ISO file format is the basis for derivation of all the above-identified file formats (excluding the ISO file format itself). These file formats (including the ISO file format itself) are referred to as the ISO family of file formats. According to the ISO family of file formats, each file contains exactly one movie box corresponding to one presentation. The movie box may contain one or more tracks, and each track resides in one track box. For the presentation of one media type (e.g., audio or video), one track is typically selected, though it is possible for there to be more than one track storing information of a certain media type. A subset of these tracks may form an alternate track group, where each track is independently decodable and can be selected for playback.
In multiparty conferencing, receivers typically display videos from a selected subset of participants in split-screen windows, e.g. an arrangement is illustrated in FIG. 6. A multipoint control unit (MCU) may transcode the incoming video streams of the selected subset of participants to one video stream, which contains all the video contents from the selected subset of participants. Alternatively, the MCU can simply forward the incoming video streams of the selected subset of participants to the receivers, after which each video stream is decoded individually.
Receivers may want to store multiparty conferencing presentations for future use. However, the current standard file format designs do not support the storage of presentations of multiparty video conferences, if the MCU forwards streams to participants. A receiver may store the video streams to be displayed in separate video tracks according to existing file format designs, e.g., the ISO base media file format. However, in that case, a player that takes the file as input has no way of knowing which video tracks should be decoded and how to display the respective video tracks.
In a variety of other application scenarios, other types of multi-source multimedia presentations that render more than one media stream for at least one type of media are possible. Examples of such other application scenarios include, e.g.: recorded video telephony, where there are two participants, the caller and the answerer; video surveillance, where there may be a large number of cameras (possibly equipped with audio sensors), that would send audio-visual signals to a control center; and recorded training-like presentations, where presentation slides and one or more talker(s) may be recorded in separate media streams and later both displayed.
Additionally, it would be useful to easily know from a file, what application scenario the multi-source presentation was or is for, thus providing a context for when the file is used in the future. Furthermore, it would be useful to know from a file more context information, e.g., participant names, telephone numbers, and who made a recording for video telephony/conferencing, or camera identifiers and/or position descriptions for video surveillance. Further still and with regard to video surveillance, it is possible for multiple audio sources to exist, each of which is associated with one video source. However, a mechanism that maps an audio source (stored in one audio track) to a video source (stored in a video track) has not been provided in conventional systems and methods.
Moreover, in application scenarios such as video telephony, conferencing, and surveillance, it is typically useful to display the active party or source in a more conspicuous manner than other parties/sources. Therefore, if information regarding an active party or source (either auditory and/or visually) were available in a file, future players of the file may automatically easily display the active party or source according to such information. Lastly and as to any auditory and/or visual presentation, silent periods where neither the auditory nor the visual scene is active are the least informative. Therefore, particularly for browsing-like purposes, it would be useful to suppress the playing time of such silent periods. Thus, having information indicative of silent periods would be helpful.