Media delivery has historically been a broadcast type model, where users/consumers all receive the same programming. Thus, any effects, cross-fades or other blending are performed upstream of the consuming device, prior to being sent over the broadcast channel(s). As is generally appreciated, the addition of these effects produces a high quality experience for the user, and also provides natural and enhanced transitions between program elements. These enhancements improve and enrich the listening experience, and can be changed or modified depending upon the “mood” of the sequence of songs or clips being played, as well as upon the audience type, time of day, and channel genre. Typically, elements that require cross-fading or other signal processing of two or more elements require precise synchronization and simultaneous playback of the elements to be processed. Thus, although in the 1960s and 1970s DJs would try to mix songs in real time, by “cueing up” the next song and starting its turntable a bit before the currently being played song ended, with the advent of digital media it has been the norm to perform such processing on a playlist of multiple songs or clips prior to broadcasting it, storing it at the media broadcaster's servers, and then sending it over the broadcast signal.
With the introduction of media compression and file based delivery, media is commonly downloaded directly to a user's device, such as, for example, an iPod, digital media player, MP3 player, PC, tablet, cellular phone, etc., without the benefit of upstream processing between elements. This leads to a less satisfactory user experience upon consumption or playback. A user simply hears one song stop, then hears a brief pause, then hears the next song begin. There is no “awareness” by the media playing device as to what the sequence is, no optimizations as to which song most naturally follows another, and each sequence of media clips is, in general unique to each user and how they organize their playlists.
Additionally, many consumer type devices, cell phones, etc. do not have the capability to perform simultaneous decode and presentation of media and elements so that they can be cross-faded or processed in real time. Such devices, e.g., cell phones, typically have a single hardware decoder per media type, so that any type of cross-fade in real time would also require additional software based decoding for other elements, which (i) has negative impact on battery life, and (ii) would require the precise synchronization of two or more decoders.
What is needed in the art are systems and methods to implement and facilitate cross-fading, interstitials and other effects/processing of two or more media elements on a downstream device directly in the compressed bitstream domain in a manner that solves the problems of the prior art.
What is further needed in the art are methods to perform such processing of compressed bitstreams which may be in differing compression formats.