New data compression methods allow an efficient compression of large data sets generally occur with respect to movies which in turn are, for example, available on DVD disks. Effectively transferring and storing these huge data sets practically requires compression.
According to recent state of the art, video signals are recorded and reproduced in a rapid sequence of individual images. In television's PAL standard there are 25 images per second, or 50 half-images. In NTSC standard there are 30 images per second, each image can be divided into lines and transferred sequentially.
Previous compression methods are based essentially on the reduction of resolution, color depth and number of images per second. With digital compression, e.g. MPEG methods, merely differential images instead of complete images are transferred, i.e. merely the differences of individual image points (pixels) compared to previous images are transferred.
MPEG (“Motion Picture Expert Group”) established an open and timely international standard addressing the needs of emerging audiovisual applications which demanded interworking. For example, the explosion of World. Wide Web (Internet) and acceptance of its interactive mode of operation has clearly revealed that traditional television paradigm were no longer suffice for audiovisual services. Users want to have access to audio and video as they now have access to text and graphics. This requires moving pictures and audio of acceptable quality at low bit rates.
The MPEG standard meanwhile is classified into MPEG-1, MPEG-2, and MPEG-4.
MPEG-1 was designed for fluid video playbacks. The MPEG-1 compression and decompression was originally a hardware-dependent method. The essential difference between MPEG-1 and MPEG-2 is that MPEG-2 can much better handle interlaced scanning, a method used in television. MPEG-2 provides compression to highest levels of quality so that movie material can be processed and edited almost one-to-one in studio quality. Consequently, MPEG-2 established itself as common standard and is nowadays used as standard format with DVD-video disks.
MPEG-4 format is a further development of the MPEG-2 format. Although MPEG-4 was originally intended as coding standard for audiovisual data having very low bit rates, its development served far more purposes than merely streaming of linear media data in Internet and wireless applications. Additionally, the compression rate of MPEG-4 is higher than MPEG-2. Further, it includes H.264 (i.e. MPEG4 part 10) as well.
Normally, movies are stored as MPEG-2 files on DVDs. Movie DVDs are comparable to common DVD-ROMs having a predetermined directory and file structure (architecture).
Players for DVD-video files recognize DVD-video data by a directory including the name VIDEO_TS. This directory contains all files relevant for playback. In the directory VIDEO_TS files have one of the following three file endings. Each index file has an extension ‘.IFO’ which is related to a corresponding backup with an ending ‘.BUP’; actual video data, i.e. also all menus and still frames, are contained in files with ending ‘.VOB’ (Video Object).
VOB files can contain also audio data, subpictures and navigation instructions.
For example, subpictures are bitmaps having two bits of color depth, which may represent simple graphics or subtitles. A typical example of subpictures are selection markings in DVD menus being half transparent. These bitmaps are overlaid with the video data.
The VOB file stores all these data parallely beside up to eight audio tracks, maximally 32 subpictures as well as information for navigation. DVD videos store all video and audio streams in video objects files having endings ‘.VOB’. A VOB file can be divided into interleaved video units (ILVU) which consists again of video object units (VOBU). A group of pictures (GOP) being contained therein summarizes the actual video and audio stream packet-by-packet.
The interleaved video units, which consist again of video object units as well as, partially, of a navigation packet (NV_PCK) and a GOP, i.e. the actual raw data. The NV_PCK contains positioning data and informs the player, among other things, about possible jump labels, and further contains various timing information.
GOPs divide into packets sized 2 KByte. A demultiplexer of the player joins these packets to continuous data streams: Video packs (V_PCK), audio packs (A_PCK), and subpicture packs (SP_PCK).
The GOP organization results from the structure of the video data. The MPEG-2 compression method limits the difference formation between frames (stored in predictive and/or bidirectional frames) within a GOP. Thus each GOP begins with a I-frame (Intra Frame) and ends usually in front of the next I-frame.
The DVD format is one of the formats which allows definition of its own subtitle format being compatible with its own audio/video container format. Other common audio/video container formats such as Audio-Video Interleave (AVI) or ISO/IEC 14496-1 (MPEG-4) do not provide specific media tracks for holding subpicture information.
To a certain extent, MPEG-4 does provide a way to include subpicture information by using the so-called BIFS (Binary Format for Scenes) script language. However, this script language is very complex for supporting in its completeness. Further, it contains subtitle information in form of text (letters) thereby causing disadvantages which will be discussed below.
As already stated above, subpicture information may be stored on DVD disks as compressed bitmaps along with timing and control information. Therefore, subtitles, if any present, are represented as images and not in the form of text. This makes it easy to represent any type of character alphabet such as Chinese characters.
Current solutions for preserving subtitle/overlay information from a DVD video disk during generation of a backup are based on coupling an audio/video container file to a second subtitle overlay information file.
In FIG. 1 a known video decoder 10 is shown during a conventional operation of DVD playback.
The decoder 10 comprises a MPEG-2 decoder 12, a DVD-subtitle (subpicture) decoder 14 as well as a compositor 16. The compositor 16 takes care of timing and performs the blending of subpicture (here subtitle) information with a video stream 18 which is decoded by the MPEG-2 decoder 12. A subtitle stream 20 from a DVD disk which is provided in bitmap format is decoded by the DVD-subtitle decoder 14. The compositor 16 joins the decoded. MPEG-2 video stream 18 and the decoded subtitle stream 20 in a timely correct manner and overlays the streams for producing a video output stream 22 to be displayed to a user.
In prior art, parallel to the conversion of DVD-video data from MPEG-2 format to MPEG-4 format, generation of the subtitle file is based on one of the following approaches.
According to a first approach, hereinafter called “timed bitmap sequence approach”, the original subpicture data from the DVD is decoded, and decoded results in form of reconstructed bitmaps are saved along with time stamps in a file which is separate to the MPEG-4 file which contains the video data. The time stamps are used for temporarily correlating corresponding subpictures and video frames. However, it is a drawback of this approach that huge amounts of data storage are required for the subtitle file. Further, it is disadvantageous that the video decoder 10 must be able to support (interpret) one of numerous existing subtitle file formats as currently up to 20 different file formats exist. This makes it very difficult for video decoders to provide complete subtitle support.
According to a second approach, hereinafter called “first timed text data approach”, the original subpicture data, i.e. the subtitle, is decoded, bitmaps are reconstructed, OCR (Optical Character Recognition) is applied to the reconstructed bitmaps for generating text data, and the text data resulting from the OCR is saved along with time stamps.
According to a third approach, hereinafter called “second time text data approach”, the original subpicture data is decoded, bitmaps are reconstructed, and a person (manually) transcripts the text information by looking at all the bitmaps and writing down the textual information. Again, the resulting text data is saved along with time stamps.
One disadvantage of the second and third approach is that all the subtitle formats which are text-based do not allow to preserve non-Roman alphabets. This renders, for example, the second approach useless since in many regions of the world including Europe non-Roman alphabets are used such as Greek and Cyrillic alphabet.
Another disadvantage is that the subtitle information cannot be saved as part of the audio/video container file, i.e. the entire presentation information (audio, video, subpicture and timing) is not self-contained but spread across two files.
Therefore, it is an object of the present invention to provide a method by which data including audio/video and subpicture or subtitle data may be converted to another data format of higher compression efficiency wherein the resulting data can be decoded, and therefore displayed, in a much easier way.
Further, it is required to provide an easy integration into existing DVD-capable video decoders as well as a seamless integration into existing audio/video containers, preserving the container format compatibility.