Speech recognition (SR) is the automatic translation of spoken words into text. Many applications are offered nowadays including the SR feature, such as word processing and e-mail applications, file management applications and systems especially designed for people with disabilities. Some programs are for specific business settings, such as medical or legal transcription.
Speech recognition is also used for creating captions for a video clip or movie. The prevailing method compares a recorded word to a database of pre-recorded words.
U.S. Pat. No. 5,649,060 to IBM provides a method of automatically aligning a written transcript with speech in video and audio clips. An automatic speech recognizer decodes speech (recorded on a tape) and produces a file with a decoded text. This decoded text is then matched with the original written transcript via identification of similar words or clusters of words. The patent does not disclose using video captions as the text source, which involves both separating text from image in the video signal and detecting when a caption has changed.
US Published application No. 2007/0055695 to IBM provides a method of partitioning a video into a series of semantic units wherein each semantic unit relates to a thematic topic. The method extracts a plurality of keywords from speech content of each of a plurality of homogeneous segments of the video and merges semantically related segments.