Digital video cameras are increasingly spreading among the marketplace. Beneath the typical digital video cameras, a growing number of portable CE devices provided with image capturing capability are applicable for capturing video sequences. The most popular portable CE devices with image capturing capability are digital cameras for still imaging enhanced with the possibility to shoot image sequences recorded as digital video clips and cellular phones equipped with image capturing sensors enabling the users shooting still image and making digital video clips.
Typically, digital video sequences are very large in file size. Even a short video sequence is composed of tens of images. Digital video cameras are conventionally adapted for recording such huge data volumes but are limited in video processing. Portable CE devices with image capturing capability such as digital cameras and cellular phones are limited in their storage provided for digital images and video clips. As a result, video is always saved and/or transferred in compressed form. There are several video-encoding techniques, which can be used for that purpose. MPEG-4 and H.263 are the most widely used standard compression formats especially also suitable for wireless cellular environments.
To allow users to generate quality video at their terminals, it is imperative that devices having a video camera, such as the aforementioned class of cellular phones, provide video editing capabilities. Video editing is the process of transforming and/or organizing available video sequences into a new video sequence. Splicing, i.e. merging, video clips is one of the most widely used editing operations, as users often wish to combine video clips. Merging video clips with various formats (MPEG-4 or H.263), or even with various coding modes within a format (different coding schemes of MPEG-4) requires bringing the clips to a common form.
When the MPEG-4 standard was developed and its profiles and levels were defined, the use case for merging video clips with different coding modes was not considered. MPEG-4 coded video clips with different coding modes cannot be concatenated. The state of the art solution to fix this problem requires fully decoding the sequences, splicing the sequences in spatial domain and re-encoding them again. More specifically, we first decompress the video clip, discard the unused frames, concatenate the remaining data, and then re-encode the generated uncompressed data. The major disadvantage of this approach is that it is significantly computationally costly, especially the encoding part, and requires huge storage capacity. The encoding is obviously a significantly computationally intensive operation for clips that, after all, have the same format, i.e., MPEG-4 format. Decoding video clips in portable CE devices can be obtained in real time; encoding them, however, cannot. To decode and encode clips of 10 minutes, the user would have to wait for more than 15 minutes in most portable CE devices, which is not practically acceptable.