1. Field of Invention
The invention relates to the representation, presentation and navigation of multimedia.
2. Background of the Invention
Sign language such as American Sign Language is the preferred means of communication for many deaf people. Written language e.g. written English is therefore a second language that must be learned as a second language.
It is difficult if not impossible to effectively transcribe sign language monologs or dissertations onto paper. However, sign language can easily be recorded onto video. For the purpose of instructing deaf people in the use of written language, and instructing hearing people in the use of sign language, it is desirable to associate signs in sign language to written language. The most basic association between text and sign language is a one-to-one association of a word and a sign. However, it is desirable to use more complex associations such as multiple text segments to single or multiple video sequences, and to allow multiple levels of association, e.g. at the word level, phrase level, sentence level, paragraph level etc.
Such instructional material can be provided by means of video tapes with associated books. The use of such material is severely limited due to the difficulty of synchronization and association of words and phrases in the text to video sequences and vice versa. Another disadvantage is the difficulty of navigating the material, e.g. finding particular paragraphs or words in the video.
With the advent of computers capable of displaying multimedia content, it is possible to show video on a computer terminal as well as text. Multimedia-computer technologies such as the SMIL (Synchronized Multimedia Integration Language) specification developed by the W3C and Flash™ from Macromedia Corporation can be used to create material where the text is displayed synchronously with the video recording of the sign language dissertation.
Although these technologies provide some support for navigation of the video and association of signs to text, they do not support slightly more complex relationships between written language and sign language, such as when multiple non-adjacent words relate to a single video sequence and vice versa. It does not represent the underlying groupings of words and video segments into meanings or concepts.
These multimedia technologies do not provide an efficient way of associating multiple texts and multiple videos. An example of associating multiple texts and multiple videos is to associate a video dissertation in Spanish Sign Language to a video dissertation in American Sign Language, and the semantically equivalent English and Spanish texts. Such associations are invaluable for instruction in foreign languages.
Thus, the currently available multimedia technologies do not provide a method for describing the required relationships, for editing these descriptions or for presenting the material.
The specification of segments of text, or text targets, is relatively simple because text provides segment boundary indicators such as spaces that delimit words, punctuation that delimit sentences and paragraph markers that delimit paragraph boundaries. In audio/video media, such segment boundary indicators are not available from the media itself. This makes the specification of segments a slow and labor intensive-process.