Co-owned U.S. patent application Ser. No. 09/545,015 (which is incorporated herein by reference) describes a system and method for creating personalized messages (such as personalized advertisements and personalizes news). An example of a personalized message structure 20 is shown in FIG. 1. It starts with a common opening 22, followed by three possible options for the middle part 24 and a common closing 26. One instance of this message is given by the sequence opening then option 1 then closing; another instance is given by opening then option 2 then closing.
A personalized audio message structure as depicted in FIG. 1, is typically created by an audio designer using dedicated tools. The audio fragments in the message structure are typically generated by the audio designer using editing tools such as, but not limited to, AVID MediaComposer, ProTools, etc.
Having the personalized message structure as well as the associated audio fragments available, a switching device can create an instance of the personalized message by playing the proper fragments in sequence.
The personalized message structure and the associated audio fragments can be made available to the switching device in a variety of ways.
In one specific scenario, the audio fragments part of the personalized message will be broadcast in compressed form in different digital television channels and assembled by a switching device, such as a digital set-top-box, at the listeners location to form one specific instance of the message. One way in which the instance can be assembled is by switching channels on-the-fly at the moment a transition from one fragment to another must be made.
In another specific scenario, the media fragments will be made available to a switching device with storage (e.g., a DVD player, a PC) using a storage medium, such as a CD-ROM or a DVD disk. The fragments will be stored on this storage medium in compressed form. The switching device will select and load the proper fragments from the storage medium, and play them in sequence.
However, current compression technology applied in digital radio, digital TV, Internet and storage applications, including MPEG and AC-3 encoding and compression, does not readily allow for seamless concatenation or switching of compressed audio fragments, which poses a major problem.
One reason for this problem is that most audio codecs used in the domains of digital television, DVD, Internet streaming, and others operate on frames (fixed size groups) of samples, instead of individual samples. One frame, which is a number of consecutive audio samples, is encoded and decoded as a unit and cannot be broken into smaller subunits. Consequently, once the material is encoded, a transition or switch between options can occur only on frame boundaries. As typically used in the digital television domain, a codec for MPEG Layer II has a frame length of 1152 samples. A codec for Dolby AC-3 has a frame length of 1536 samples. If the length of a fragment (in samples) to be compressed is not an exact multiple of the frame size (in samples), the remainder of the fragment will either be thrown away during encoding, leading to loss of data and severe glitches, or it will be padded with zeroes, leading to pauses in the presentation. Obviously both are disadvantageous as they lead to a non-seamless presentation when concatenating and playing audio options after decoding.
Another reason for the problem is that most audio codecs used in the domains of digital television, DVD, Internet streaming, and others, encode audio frames based on the contents of previous frames.
In a filter-bank based codec, such as MPEG layer II, the outcome of the encoding process of a current audio frame depends on the filter bank states produced by the past frames. The filter bank acts like a memory. More specifically, MPEG Layer II uses a 32-band filter bank to decompose the incoming signal into sub band samples, which are then quantized. Alias cancellation affects neighboring sub bands, but not successive frames, so it does not pose a problem for the switching. However the states of the filter bank in the encoder and in the decoder depend on the previously encoded frame. To achieve perfect reconstruction after the decoder filter bank, the filter states must be the same as in the encoding process.
In a transform-based codec, such as AC-3, the window and overlap-add mechanism introduces a dependency between successive frames. Here the overlap-add requires consecutive frames to be encoded and decoded in the right context to ensure that alias components cancel out in time. More specifically, AC-3 uses a windowing of the input data, a DCT and subsequent IDCT and overlap-add in the decoder. Successive windows overlap. Alias cancellation is in the time domain and requires the proper history to work. If arbitrary AC-3 streams are concatenated, the alias cancellation does not work at the splice point. This leads to audible artifacts, which are theoretically much worse than in the MPEG case. At the start of an encode process of several frames a start window is used which effectively mutes the first 256 samples of the first frame. This creates a clearly audible gap, which is not acceptable for concatenation. The last frame of a decoded sequence ends with a fade out of the signal over the final 256 samples; due to the missing overlap add of the next frame.
The fact that most audio codecs use a history means that fragments that are intended to be played back in sequence cannot be encoded in isolation, even if their lengths are exact multiples of the frame size defined by the compression scheme. If no additional measures are taken, the transition from one fragment to another will not be seamless, and lead to audible artifacts.
Accordingly, what is required is a method and system for manipulating and encoding/compressing audio fragments such that a switching device can decode and play such compressed fragments in sequence without audible gaps or artifacts. The present invention discloses such a method and system.