Existing solutions for composing media and streaming to devices either do not allow for independent combination of media files at all because they require single files containing a permanently fixed set of tracks, or require non-standardized and proprietary methods to combine independent tracks available on the server, which limit widespread industry implementation and adoption due to the closed nature of such systems.
For instance, with respect to digital versatile disks (DVD) and Blu-ray track formats, the tracks are provided as a single file containing many tracks, thus limiting flexibility and usability since in order to decode any portion of the data included in the DVD or Blu-ray, the entire file typically must be present. A single monolithic file for content, while acceptable in terms of delivery by way of physical discs, is not very efficient in terms of streaming, and thus severely limits content streaming solutions.
For another example, media presentation description (MPD) and other adaptive streaming solutions that switch entire files cannot independently switch tracks, so are, in practicality, limited to switching only video bitrates, or a few other attributes. This is so because of a “combinatorial complexity problem.” For instance, a content provider who desires to make a feature length film available via adaptive streaming generally must previously encode a different file for all the combinations that will be utilized for the set of clients. However, a typical movie may require multiple video resolutions, camera angles, video bitrates, audio channels, supported languages, descriptive audio tracks and closed captioning languages. Every combination represents a separate muxed version of the movie, leading to the aforementioned combinatorial complexity problem.
For example, a movie with eight audio tracks to cover different language and codecs, two caption streams, and two video angles, would result in 8×2×2×8=256 separate, multiplexed or “muxed” versions of the movie (e.g., 256 different representations of the same content), each of which is be stored on content servers to allow subsequent streaming. Moreover, this problem becomes greater when the content is to be HTTP Live adaptive streaming with six quality levels broken up into ten-second segments. A two hour movie becomes 256×720=138,240 files (e.g., 120 minutes and 6 chunks per minute=720 chunks per movie). Further still, the illustrated example provides only a few options for a client to choose, in particular, a client can choose between eight languages, two caption streams, and two video angles. In order to give the client more options and/or to adapt to a wider set of client devices, network conditions, and user preferences, the number of fixed muxed representations of the content grows exponentially.
The above-described deficiencies of today's techniques are merely intended to provide an overview of some of the problems of conventional systems, and are not intended to be exhaustive. Other problems with conventional systems and corresponding benefits of the various non-limiting embodiments described herein may become further apparent upon review of the following description.