The number of devices capable of playing media is growing at a staggering rate. Virtually all modern personal computers and many modern cell phones, personal digital assistants, personal media players, set-top boxes, game consoles, and even refrigerators are capable of media playback. Such disparate devices can differ widely in their memory and processing capabilities, screen sizes, power consumption restraints, and available communications bandwidth. Such devices may receive media for playback via any number of communications technologies, including cable and DSL, fiber to the home, Wi-Fi, BlueTooth, 2.5G and 3G mobile phone networks, and the like.
Now that consumers have so many different connected media playback devices, many wish to be able to access all of their content at any time, from anywhere. But at the same time, few consumers wish to educate themselves about the technical details of their communications interfaces or device constraints.
Similarly, few content providers wish to or are able to encode, store, and select from multiple versions of each piece of media to provide a version appropriate to provide to a particular client device. This approach is burdensome in part because it is often difficult for a content provider to ascertain the playback capabilities of any particular playback device, yet in most cases, the consumer is also unwilling or unable to ascertain and provide such information.
Another approach to the problem has been to encode each piece of media into multiple independent streams at varying bitrates, then switch between those streams to address varying bandwidth capacities. Technologies such as SureStream, developed by Real-Networks, Inc. of Seattle Wash., take such an approach, monitoring delivery rates and attempting to predict which bitrate stream to deliver as network capacity varies over time. Still, this approach is complex to implement and addresses only the bandwidth dimension of the differences between playback clients.
A better solution may be to utilize variable-fidelity media, encoding each piece of media a single time into a base layer and a set of additive layers that enhance the quality, size, or other attributes of the base layer.
The concept of variable fidelity, scalable, or layered media is well known in the art. According to this concept, a piece of media or a presentation comprising multiple pieces of media is split up into a set of layers, each layer containing information that builds on top of one or more of the layers below it.
Layered media or layered presentations have become commonplace in certain contexts, while remaining obscure in others. One simple example of a commonly encountered form of layering is a web page that may comprise a base layer (e.g., basic text and html layout information) and one or more enhancement layers, for example a CSS style sheet layer, a scripting layer, and/or one or more media layers (e.g., individual image files). A client device may choose to display some or all of these layers, depending on the capabilities of the client and/or network conditions. For example, a mobile phone browser may obtain and display only the base text layer, whereas a desktop computer web browser may obtain and display all layers. For another example, a client device may disable bandwidth-heavy media layers when using a slow network connection.
Many audio and video compression/decompression (“codec”) specifications include support for scalable or layered modes, although few scalable modes are in common usage. For example, the MPEG-2 standard defines several profiles that include support for signal-to-noise ratio (“SNR”) and/or spatial scalable modes. For another example, the H.264 standard with the Scalable Video Coding extension defines profiles that provide for temporal, spatial, and SNR scalability. These three types of scalability have the following general characteristics:                Temporal scalability: media is coded at multiple frame rates (video) or sampling rates (audio). For example, a base layer may provide video encoded at 7.5 frames per second (FPS) video, while enhancement layers can be added to improve the frame rate to 15 FPS and 30 FPS.        Spatial scalability: video is coded at multiple spatial resolutions. For example, a base layer may provide video encoded at a resolution of 320×240, while multiple enhancement layers may increase the resolution to 640×480 and 800×600.        SNR scalability: media is coded at multiple degrees of fidelity or clarity. For example, a base layer may provide audio encoded at 8 bits per sample, while enhancement layers increase the bit depth to 16 and 24 bits per sample.        
In the audio/video context, the promise of layered media codecs has remained largely unrealized. Disclosed are methods and systems which use layered media to improve the distribution of media using client-server, peer-to-peer (“P2P”) and/or hybrid (mixed client-server and P2P) models.