Media delivery has historically followed a broadcast type model, where users/consumers all receive the same programming. Thus, any effects, cross-fades or other blending between subsequent clips or program elements are performed upstream of the consuming device, prior to being sent over the broadcast channel(s). As is generally appreciated, the addition of these effects produces a high quality experience for the user, and also provides natural and enhanced transitions between program elements. These enhancements can significantly improve and enrich the listening experience, and can be changed or modified depending upon the “mood” of the channel, the sequence of songs or clips being played, as well as the audience type, time of day, and channel genre. Typically, elements that require cross-fading, blending or other signal processing of two or more elements require precise synchronization and simultaneous playback of the elements to be processed. Thus, although in the 1960s and 1970s DJs would try to mix songs in real time, by “cueing up” the next song and starting its turntable a bit before the currently being played song ended, with the advent of digital media it has become the norm to perform such processing on a playlist of multiple songs or clips prior to broadcasting it, storing its result at the media provider or broadcaster's servers, and then send it over the broadcast channel.
With the introduction of media compression and file based delivery, various types of media are commonly downloaded directly to a user's device, such as, for example, an iPod, digital media player, MP3 player, PC, tablet, cellular phone, smart phone, etc., and various hybrid devices or devices with equivalent functionalities, without the benefit of upstream processing between media elements. This leads to a less satisfactory user experience upon user consumption or playback. A user simply hears one song stop, then hears a brief pause, then hears the next song begin. There is no “awareness” by the media playing device as to what the sequence is, no optimizations as to which song most naturally follows another in the playlist, no sense of the “feel” “mood” or tempo of the playlist or any segment of it, and each sequence of media clips is, in general, unique to each user and how they organize their respective playlists.
Additionally, many consumer type devices, cell phones, smart phones, tablets, etc. do not have the capability to perform simultaneous decode and presentation of media and elements so that they can be cross-faded or processed as played back in real time. Such devices, for example cell phones, typically have a single hardware decoder per media type, so that any type of cross-fade in real time would also require additional software based decoding for the other elements, which (i) has negative impact on battery life, and (ii) would also require the precise synchronization of two or more decoders.
What is needed in the art are systems and methods to implement and facilitate cross-fading, blends, interstitials and other effects/processing of two or more media elements on a downstream device for various purposes so as to enhance the elistenign experience, and, for example, replicate to the extent possible the sound and feel of broadcast programming.
What is further needed in the art are methods to perform such processing involving two or more elements on a downstream device, where only a single hardware decoder is available or where other system constraints are operative.