1. Field of the Invention
The present invention relates to reproduction of data on a storage medium, and, more particularly, to a storage medium containing text-based caption information compatible with the subpicture method of a digital versatile disc (DVD) and the presentation method of a Blu-ray disc, and a reproducing apparatus and reproducing method thereof.
2. Description of the Related Art
Among conventional caption technologies, there exists text-based caption technologies, which are mainly used in a personal computer (PC), and a subpicture-graphic-based caption technology, which is used in a DVD.
First, as examples of the conventional text-based caption technologies mainly used in a PC, there are Synchronized Accessible Media Interchange (SAMI) of Microsoft, and Real-text technology of RealNetworks. The conventional text-based caption technologies have a structure in which a caption is output on the basis of synchronization information in relation to a file in which video stream data is recorded, or video stream data provided on a network.
FIG. 1 is a diagram illustrating the structure of a caption file used in a text-based caption technology mainly used in the conventional PC.
Referring to FIG. 1, there is a text-based caption file for video stream data, and a caption for video stream data is output on the basis of synchronization time information, for example, <sync time 00:00>, contained in the caption file. An example of a caption file constructed assuming continuous reproduction of the video stream data is shown.
FIG. 2 is a diagram illustrating the structure of an apparatus reproducing the conventional text-based captions.
Referring to FIG. 2, a text caption file is read from a storage medium 200, stored in a text caption data and font data buffer 220, and then converted into bitmap image graphic data by a text caption decoder 222. By control of a graphic controller/graphic data buffer 224, the converted graphic data is output on the screen 232 via a blender 226 overlapping video frame data from a video frame buffer 214 that has been decoded in a video decoder 212 after being separated by a demultiplexer 211 from audio/video (AV) data read out from an AV data buffer 210. A speaker 230 reproduces audio data that has been decoded by an audio decoder 213 after being separated by the demultiplexer 211 from the AV data read out from the AV data buffer 210
However, as shown in FIG. 2, the conventional text-based caption file structure considers only synchronization time (<sync time=00:00>) by which a caption is displayed on the screen, and the type, size, and color of font when a caption is output on the screen, but does not consider how long a bitmap image is kept in a buffer after the bitmap is generated by decoding text caption data. Accordingly, there is a problem such that in a reproducing apparatus using a low speed processor, a caption cannot be output on the screen in real time as the conventional DVD reproducing apparatus reproduces data.
Meanwhile, the subpicture-graphic-based caption technology used in the conventional DVD will now be explained.
A DVD uses a bitmap image for a subtitle. Subtitle data of a bitmap image is losslessly encoded and recorded on a DVD. A maximum of 32 losslessly encoded bitmap images are recorded on a DVD.
FIG. 3 is a diagram illustrating the data structure of the conventional DVD explaining the structure of a caption file used in a subpicture-graphic-based caption technology used in the conventional DVD.
Referring to FIG. 3, in a DVD, the disc area is divided into a video manager (VMG) area and a plurality of video title set (VTS) areas. Title information and information on title menus is stored in the VMG area, and information on the title is stored in the plurality of VTS areas. The VMG area is formed with 2˜3 files, and each of the VTS areas is formed with 3˜12 files. The VMG area includes a VMGI area storing additional information on the VMG, a video object set (VOBS) area storing moving information (video objects) on a menu, and a backup area (BUP) of the VMGI. These areas are stored as one file and among them the presence of the VOBS area is optional.
In a VTS area, information on a title that is a reproduction unit, and a VOBS having moving picture data is stored. In one VTS, at least one title is recorded. The VTS area includes video title set information (VTSI), a VOBS having moving picture data for a menu screen, a VOBS having moving picture data of a video title set, and backup data of the VTSI. The presence of the VOBS to display the menu screen is optional. Each VOBS is again divided into recording units of a VOB and Cells that are recording units. One VOB is formed with a plurality of Cells. The smallest recording unit mentioned in the present invention is the Cell.
FIG. 4 is a diagram illustrating a detailed structure of the VOBS having moving picture data in the data structure of the conventional DVD shown in FIG. 3.
Referring to FIG. 4, one VOBS is formed with a plurality of VOBs, and one VOB is formed with a plurality of Cells. A Cell is again formed with a plurality of video object units (VOBUs). A VOBU is data encoded by a moving picture experts group (MPEG) method that is a moving picture coding method used in a DVD. According to the MPEG, since images are coded through spatiotemporal compression, a previous or succeeding image is required to decode a predetermined image. Accordingly, in order to support a random access function by which reproduction starts from an arbitrary position, intra coding that does not require a previous or succeeding image is performed in each predetermined interval. In the MPEG, this is referred to as an intra picture or I picture, and pictures between this I picture and the next I picture are referred to as a group of pictures (GOP). Usually, a GOP is formed with 12˜15 images.
Meanwhile, the MPEG defines system coding (ISO/IEC13818-1) to combine video data and audio data into one bitstream. The system coding defines two multiplexing methods: a program stream (PS) multiplexing method for optimization to generate one program and store in an information storage medium, and a transport stream (TS) multiplexing method appropriate to generate a plurality of programs for transmission. The conventional DVD employs the PS coding method.
According to the PS coding method, video data or audio data is divided into units referred to as a pack (PCK) and multiplexed through a time division method. Data other than video data and audio data defined by the MPEG is named as a private stream and also is contained in the PCKs such that the private stream can be multiplexed together with the video data and audio data.
A VOBU is formed with a plurality of packs (PCK). The first pack (PCK) among the plurality of packs (PCK) is a navigation pack (NV_PCK), and the remaining packs include video packs (V_PCKs), audio packs (A_PCKs), and subpicture packs (SP_PCKs). Video data contained in a video pack is formed with a plurality of GOPs.
The subpicture pack (SP_PCK) is used for 2-dimensional graphic data and caption data. That is, in a DVD, caption data displayed overlapping a video image is encoded by the same method as for 2-dimensional graphic data. In the case of DVD, a separate encoding method to support multiple languages is not employed and each caption data is converted into graphic data and then processed and recorded by one encoding method. The graphic data for a caption is referred to as a subpicture. The subpicture is formed with subpicture units (SPUs). A subpicture unit corresponds to one sheet of graphic data.
FIG. 5 is a diagram illustrating the correlation of a subpicture pack (SP_PCK) and a subpicture unit (SPU) in the structure of the VOBS having moving picture data shown in FIG. 4.
Referring to FIG. 5, one subpicture unit (SPU) includes a subpicture unit header (SPUH), pixel data (PXD), and a subpicture display control sequence table (SP_DCSQT). These are sequentially divided and recorded in subpicture packs (SP_PCK) each with a size of 2048 bytes. At this time, if the last data of the subpicture unit (SPU) cannot fill one subpicture pack (SP_PCK) fully, the remainder of the last subpicture pack (SP_PCK) is filled with padding data. As a result, one subpicture unit (SPU) is formed with a plurality of subpicture packs (SP_PCKS).
Recorded in the subpicture unit header (SPUH) are the size of the entire subpicture unit (SPU) and the location from which the subpicture display control sequence table (SP_DCSQT) having display control information in the subpicture unit (SPU) starts. The pixel data (PXD) is coded data obtained by compression coding a subpicture. The pixel data (PXD) forming a subpicture can have four types of values, including background, pattern pixel, emphasis pixel-1, and emphasis pixel-2. The values can be expressed by two bits, and have binary values, 00, 01, 10, and 11, respectively. Accordingly, the subpicture can be regarded as a set of data formed with a plurality of lines and having four types of pixel values. Encoding is performed for each line.
FIG. 6 is a diagram illustrating a run-length coding method among methods of encoding the subpicture unit shown in FIG. 5.
Referring to FIG. 6, in the run-length coding method, when one to three instances of an identical pixel data value continue, the number of the continued pixel (No_P) is expressed by 2 bits and after that, a 2-bit pixel data value (PD) is recorded. When 4 to 15 instances of an identical pixel data value continue, the first 2 bits are recorded as 0s, 4 bits are used to record the No_P, and 2 bits are used to record the PD. When 16 to 63 instances of an identical pixel data value continue, the first 4 bits are recorded as 0s, 6 bits are used to record the No_P, and 2 bits are used to record the PD. When 64 to 255 instances of an identical pixel data value continue, the first 6 bits are recorded as 0s, 8 bits are used to record the No_P, and 2 bits are used to record the PD. When a run of identical pixel data values continues to the end of a line, the first 14 bits are recorded as 0s, and 2 bits are used to record PD. When encoding of one line is thus finished, if byte-unit alignment is not achieved, 4 bits of 0s are recorded. The number of encoded data bits in one line cannot exceed 1440 bits.
FIG. 7 is a diagram illustrating the data structure of the SP_DCSQT having output control information of pixel data (PXD) shown in FIG. 5.
Referring to FIG. 7, the SP_DCSQT contains output control information for outputting the pixel data (PXD) described above. The SP_DCSQT is formed with a plurality of subpicture display control sequences (SP_DCSQ). One SP_DCSQ is a set of output control commands (SP_DCCMDs) performed at one time, and is formed with an SP_DCSQ_STM indicating a starting time, an SP_NXT_DCSQ_SA containing position information of the next SP_DCSQ, and a plurality of SP_DCCMDs.
The SP_DCCMD includes control information on how the pixel data (PXD) described above is combined with a video image and output, and includes color information of the pixel data, transparency information (or contrast information) of the video data, information on an output starting time, and an output finishing time.
FIG. 8 is a diagram illustrating the output result of a subpicture together with moving picture data according to the data structure described above.
Referring to FIG. 8, the pixel data itself is losslessly encoded, and information on a subpicture display area having an area where a subpicture is output in a video display area having a video image area, and information on an output starting time and finishing time are contained in the SP_DCSQT as output control information.
In a DVD, subpicture data for caption data of a maximum of 32 different languages can be multiplexed together with moving picture data and recorded. These languages are distinguished by a stream id provided by the MPEG system coding method, and a sub stream id defined by the DVD. Accordingly, if a user selects one language, the subpicture unit (SPU) is extracted by taking only subpicture packs (SP_PCK) having the stream id and sub stream id corresponding to the language, and then, by decoding the subpicture unit (SPU), caption data is extracted and, according to output control information, the output is controlled.
This caption technology based on the subpicture graphic formed with bitmap images used in the conventional DVD has the following problems.
First, if bitmap based caption data is multiplexed with moving picture data and recorded, when the moving picture data is encoded, the bit generation amount occupied by subpicture data should be considered in advance. That is, by converting the caption data into graphic data, the amount of data generated in each language is different and the entire amount is huge. Usually, encoding moving picture data is performed only once and, by addition to the output, subpicture data for each language is again multiplexed and a DVD appropriate to each region is manufactured. However, depending on the language, there occurs a case in which the amount of subpicture data is huge, and when the subpicture data is multiplexed with the moving picture data, the total generated bit amount exceeds the maximum limit. Also, since the subpicture data is multiplexed between each moving picture data unit, the starting position of each VOBU becomes different in each region. In a DVD, since the starting position of a VOBU is separately managed, whenever a multiplexing process begins, this information should also be updated.
Secondly, since the contents of each subpicture cannot be known, it cannot be used for a separate purpose such as outputting two languages at the same time, or outputting only caption data without moving picture data to use for language learning.
As described above, since the text-based caption technology used in a PC and the caption technology using subpicture graphics as in a DVD are designed differently, if text-based caption data information is applied to the DVD reproducing apparatus without change, such problems as difficulties in guaranteeing real time reproduction or managing a subpicture data buffer occur.