The present invention relates to subtitles and, more particularly, to colorwiping and positioning the subtitles.
Subtitles are superimposed on a video image to convey information to a viewer which supplements the video image. In Karaoke, for example, lyrics of songs are displayed on the video image as subtitles while a viewer sings along to an audio track of an accompanying video image. The subtitles also convey information to the viewer in the manner in which they are displayed. Highlighting the lyrics of songs in Karaoke, for example, cues the singer to sing, while moving the lyrics off the video screen indicates to the viewer to stop singing.
Television broadcasting or video reproduction (such as from a video disk) provides subtitles for display with the video image. However, the subtitles are permanently combined with the underlying video image and can be manipulated only at the transmitting (or recording) end and not at the receiving (or reproducing) end. That is, subtitles displayed in television broadcasting or video reproduction are "fixed" and cannot be highlighted or moved at the receiving (or reproduction) end. The subtitles also cannot be turned off, which is particularly important in Karaoke where a singer wants to test his/her singing abilities or enjoy the music video without the interruption of the subtitles.
The television broadcasting and reproduction systems cannot adequately manipulate the subtitles at the transmitting (or recording) end. The television broadcasting and reproduction systems require painstaking trial and error creation and manipulation of subtitles. In Karaoke, for example, where sing-along music videos are mass produced, it is desirable that each music video be produced quickly and efficiently. This is not possible with the television broadcasting and reproduction systems which require slow and tedious work to custom tailor each music video. Notably, dynamic positioning in a fixed-type television broadcast or recording is not possible because the subtitles are an integral part of video picture. Moving the subtitles, therefore, would leave a blank space where the subtitles were once superimposed.
Compact Disc Graphics (CD-G) provide more flexibility in displaying subtitles because this technique records graphics on a compact disc (CD) in the form of subcodes. However, CD-G has a serious disadvantage because this technique is limited to CD applications, which are slow by television standards. That is, the CD-G technique does not lend itself to creation and manipulation of subtitles in real-time television broadcasts or video reproductions.
CD-G is successful for computer applications because the graphics are programmed in advance and the large processing time required to create the graphics is largely unseen by the end user. As will be shown with reference to FIGS. 16a-16c and 17, however, the lead time required to generate a full CD-G screen is 10.24 seconds, which is grossly inadequate for normal television or video broadcasts.
FIG. 16a depicts the CD-G data format in which one frame includes 1 byte of a subcode and 32 bytes of audio channel data. Of the 32 bytes, 24 bytes are allocated for L and R audio channel data (each channel having 6 samples with 2 bytes per sample) and 8 bytes are allocated to an error correction code. The frames are grouped as a block of 98 frames (Frame 0, Frame 1, . . . , Frame 96 and Frame 97) as shown in FIG. 16b. Eight blocks P,Q,R,S,T,U,V and W are transmitted as shown in FIG. 16c. The subcodes for Frames 0 and 1 in each block are defined as sync patterns S0, S1, whereas the remaining 96 frames store various subcode data. Among a group of 8 blocks, the first 2 blocks P, Q are allocated to search data employed for searching through record tracks; and graphic data can be allocated to the subcodes in the remaining 6 blocks R,S,T,U,V and W.
Since each block of 98 frames is transmitted at a repeating frequency of 75 Hz, the data transmission rate for 1 block is (75.times.98 bytes) 7.35 kHz, resulting in a subcode bit rate of 7.35K bytes/s. The transmission format for transmitting the information present in blocks R,S,T,U,V and W is shown in FIG. 17. Each of the 96 frames (2,3, . . . 97) of the 6 blocks (R,S,T,U,V and W) is arranged as a packet including 6 channels (R to W) of 96 symbols per channel. The packet is further subdivided into 4 packs of 24 symbols each (symbol 0 to symbol 23), with each symbol representing a frame.
A CD-G character is made up of 6.times.12 pixels. Since each pack is 6.times.24, a 6.times.12 character is easily accommodated in each pack. The CD-G format allocates the six channels of (R,S,T,U,V and W) and the 12 symbols 8 to 19 to a character. The remainder of the symbols in each of the packs store information about the character.
Mode information is stored in the first 3 channels (R, S, T) of symbol 0 in each pack, and item information is stored in the last 3 channels (U, V, W) of symbol 0. A combination of the mode information and the item information defines the mode for the characters stored in the corresponding pack as follows:
TABLE 1 ______________________________________ Mode Item ______________________________________ 000 000 mode 001 000 graphics mode 001 001 TV-graphics mode 111 000 user's mode ______________________________________
An instruction is stored in all of the channels of symbol 1. Corresponding mode, item, parity or additional information for the instruction is stored in all of the channels of symbols 2 to 7. Parity for all of the data in the channels of symbols 0 to 19 is stored in all of the channels of the last 4 symbols (symbols 20 to 23) of each pack.
As discussed, the data is transmitted at a repeating frequency of 75 Hz. Therefore, a packet which contains 4 packs is transmitted at a rate of 300 packs per second (75 Hz.times.4 packs). That is, with 1 character allocated to the range of 6.times.12 pixels, 300 characters can be transmitted in 1 second.
However, a CD-G screen requires more than 300 characters. A CD-G screen is defined as 288 horizontal picture elements.times.192 vertical picture elements and requires more than twice the 300 characters transmitted in 1 second. The total transmission time for a 288.times.192 screen is, therefore, 2.56 seconds as shown by the following equation: EQU (288/6).times.(192/12).div.300=2.56 seconds
This is extremely long to regenerate each screen when it is considered that screens are usually refreshed every 0.6 seconds. This problem is compounded when hexadecimal codes are used for the characters because each hexadecimal expression requires 4 bits to represent 1 pixel. As a result, 4 times the data described above is transmitted increasing the transmission rate to 10.24 seconds (4.times.2.56 seconds). Since each screen requires a sluggish 10.24 seconds for transmission, a continual transmission of screens means that a lag time of 10.24 seconds is experienced when transmitting screens using the CD-G technique.
Thus, the CD-G technique is not performed in real time and is unacceptably slow for use in a real time broadcast. In generating Karaoke music videos, for example, it would be nearly impossible to synchronize the subtitles with the precise moment the lyrics are to be sung because the subtitles would have to be generated 10.24 seconds in advance of the music video.
The CD-G system also suffers from defects in reproducing the subtitles. The CD-G system displays subtitles only upon normal reproduction and not during special reproduction such as a fast forward or fast reverse reproduction. CD-G pictures are also subject to sing phenomena (in which oblique portions of a character are ragged) or flickering because this system allocates only one bit of data for each picture element. The lag time of the CD-G picture also prevents switching the subtitle display on or off at a high speed.
In one type of system (known as the CAPTAIN system), dot patterns, as well as character codes, represent the subtitles. This system, however, does not appear to be any better than the CD-G system and suffers from some of the same disadvantages. In both systems, for example, the subtitles lack refinement because these systems do not provide sufficient resolution power in displaying the subtitles. The CAPTAIN system, for example, is developed for a 248 (horizontal picture elements) by 192 (vertical picture elements) display and not for high resolution video pictures of 720.times.480.